MLA-C01 Scenario Patterns & Exam Traps
Table of Contents
- Exam Scenario Patterns & Traps
- How to Read Exam Questions
- Category 1: Which Endpoint?
- Category 2: Which Data Service?
- Category 3: Which Algorithm?
- Category 4: Training Troubleshooting
- Category 5: Monitoring & Drift
- Category 6: Security & Compliance
- Category 7: Cost Optimization
- Category 8: Generative AI / Bedrock
- Category 9: Feature Management
- Category 10: Transfer Learning vs Incremental
- Top 10 Exam Traps
- Augmented Manifest (Ground Truth Output)
Exam Scenario Patterns & Traps
How to think through MLA-C01 questions. Learn the patterns, not just the facts.
How to Read Exam Questions
Most questions follow this structure:
1. SCENARIO — A company/team needs to do X
2. CONSTRAINT — With requirement Y (cost, latency, compliance)
3. OPTIONS — 4 answers (2 clearly wrong + 2 plausible)
4. KEY PHRASE — "MOST cost-effective", "LEAST overhead", "BEST performance"
Critical reading:
"MOST cost-effective" = cheapest that meets requirements
"LEAST operational overhead" = most managed/serverless (NOT cheapest)
"LOWEST latency" = fastest response (may cost more)
"MINIMIZE changes" = use existing tools, avoid migration
Category 1: Which Endpoint?
| Scenario Keywords | Answer |
|---|---|
| “Consistent traffic, <100ms, real-time API” | Real-Time Endpoint |
| “5 requests/day, cost-sensitive, minimize cost” | Serverless Inference |
| “2GB video, 10 min processing, >6MB payload” | Async Inference |
| “Weekly batch scoring, 1M records, scheduled” | Batch Transform |
| “500 models, 10 requests each/day” | Multi-Model Endpoint |
| “Compare 2 models, 10% vs 90% traffic” | Production Variants |
| “Test new model silently, zero user risk” | Shadow Testing |
Real-time endpoints cannot scale to 0. Min = 1 instance. For scale-to-0, use Serverless or Async.
Category 2: Which Data Service?
| Scenario | Answer | NOT |
|---|---|---|
| “Simple ETL, serverless” | Glue ETL | EMR (overkill) |
| “Petabyte-scale Spark” | EMR | Glue (not enough scale) |
| “No-code data prep for analysts” | Glue DataBrew | Data Wrangler (ML-focused) |
| “ML-specific data prep in SageMaker” | Data Wrangler | DataBrew (not ML-specific) |
| “SQL on S3” | Athena | Redshift (no warehouse needed) |
| “Real-time streaming + custom processing” | Kinesis Data Streams | Firehose (no custom consumers) |
| “Deliver stream to S3” | Data Firehose | Data Streams (simpler) |
| “Migrate existing Kafka” | MSK | Kinesis (different API) |
| “Central feature repo, training + serving” | Feature Store | S3 versioning |
| “Data warehouse with ML predictions” | Redshift ML | Athena |
Category 3: Which Algorithm?
| Scenario | Answer | Key Distinction |
|---|---|---|
| “Tabular classification/regression” | XGBoost | Default choice for structured data |
| “Many categorical features, tabular” | CatBoost | Native categorical support |
| “AutoML, no tuning, highest accuracy” | AutoGluon-Tabular | Ensemble stacking |
| “Time series, 5000 products, forecast” | DeepAR | Multiple related time series |
| “Anomaly detection, no labels, streaming” | Random Cut Forest | Unsupervised anomaly |
| “Suspicious login IPs” | IP Insights | Entity-IP associations |
| “Sentiment analysis, custom data” | BlazingText | Text classification with training |
| “Sentiment analysis, no training needed” | Comprehend | Managed AI service |
| “Pixel-level image segmentation” | Semantic Segmentation | Per-pixel labels |
| “Sparse click data, recommendations” | Factorization Machines | Pairwise interactions |
| “Managed recommendations, no ML needed” | Personalize | Managed service |
| “Document OCR, forms, tables” | Textract | No training needed |
| “Custom image classifier, 10 images” | Rekognition Custom Labels | Few images |
Category 4: Training Troubleshooting
| Error / Symptom | Cause | Fix |
|---|---|---|
| “Insufficient disk space” | File mode, data too large | Pipe Mode or FastFile Mode, or increase VolumeSizeInGB |
| “Cannot pull Docker image” | Missing ECR permissions/VPC endpoint | Add ECR VPC endpoints + IAM |
| “Cannot access S3 data” | Missing S3 permission/VPC endpoint | Add S3 VPC endpoint + IAM |
| “CUDA out of memory” | Model too large for GPU | Smaller batch size, Model Parallelism, or larger GPU |
| “MaxWaitTimeExceeded” | Spot instance not available | Increase max_wait or switch to On-Demand |
| “Loss not decreasing” | Learning rate issue | Check LR (too high/low), check data |
| “Training 10x slower than expected” | Wrong instance type | CPU for deep learning → switch to GPU |
| “Spot training restarts from scratch” | No checkpointing configured | Enable checkpoint_s3_uri |
| “GPU utilization only 20%” | I/O bottleneck | Use Pipe mode, increase batch size, more data workers |
| “Training costs too high” | On-demand instances | Spot Training + Checkpointing (up to 90% savings) |
Category 5: Monitoring & Drift
| Scenario | Answer | Key Detail |
|---|---|---|
| “Feature mean shifted from $50K to $80K” | Data Quality Monitor | NO ground truth needed |
| “Accuracy dropped from 95% to 80%” | Model Quality Monitor | NEEDS ground truth labels |
| “Approving men at higher rate than women” | Bias Drift Monitor | Uses Clarify baseline |
| “Feature importance shifted” | Explainability Drift | SHAP value changes |
| “Explain why loan was denied” | Clarify SHAP | Local explanation |
| “Set up complete monitoring” | Data Capture → Baseline → Schedule → Alarms → Remediation | |
| “Automate retraining on drift” | Monitor → CloudWatch → EventBridge → Pipeline |
Critical: Model Monitor requires Data Capture to be enabled first. Without it = no data to monitor.
Category 6: Security & Compliance
| Scenario | Answer |
|---|---|
| “HIPAA-compliant ML pipeline” | KMS + TLS + VPC + Inter-container encryption + CloudTrail |
| “Data must not leave AWS network” | VPC Endpoints + Private Subnets |
| “Data scientists can train but not deploy” | IAM policies (allow CreateTrainingJob, deny CreateEndpoint) |
| “Encrypt with company-managed keys” | KMS Customer Managed Key (CMK) |
| “Prevent training container from internet” | Network Isolation mode |
| “Find PII in S3” | Macie |
| “Handle PII in ETL” | Glue DataBrew |
| “Audit who accessed what” | CloudTrail |
| “Rotate database credentials” | Secrets Manager |
Category 7: Cost Optimization
| Current Setup | Optimization |
|---|---|
| On-Demand training, 12 hours | Spot Training + Checkpointing (90% savings) |
| GPU for XGBoost | Switch to CPU (ml.m5) — XGBoost doesn’t benefit |
| Always-on endpoint, 5 req/day | Serverless Inference |
| GPU inference (ml.p3) | ml.inf1/inf2 (Inferentia) or ml.g4dn |
| Pipeline reprocesses all data | Step Caching |
| 500 individual endpoints | Multi-Model Endpoint |
| Training restarts frequently | Warm Pools |
| Athena scanning large CSVs | Convert to Parquet + Partition |
| Large FM inference costs | Bedrock Intelligent Prompt Routing (30% savings) |
Category 8: Generative AI / Bedrock
| Scenario | Answer | Key Distinction |
|---|---|---|
| “Q&A over company documents” | Bedrock Knowledge Bases (RAG) | Retrieve + generate |
| “AI that understands legal terminology” | Bedrock fine-tuning | Change model knowledge |
| “Block offensive AI outputs + PII” | Bedrock Guardrails | Content safety |
| “Multi-step task automation with LLM” | Bedrock Agents | Action Groups + Lambda |
| “Compare FM output quality” | Bedrock Model Evaluation | Auto or human eval |
| “Reduce FM inference cost” | Intelligent Prompt Routing | Auto-select model size |
Fine-tuning = change model behavior/knowledge. RAG = provide context at inference.
Category 9: Feature Management
| Scenario | Answer |
|---|---|
| “Reuse features across multiple teams” | Feature Store with Feature Groups |
| “Real-time feature lookup for inference (<10ms)” | Feature Store Online Store |
| “Prevent future data from leaking into training” | Feature Store Offline with Point-in-Time |
| “Both training and real-time serving from same features” | Enable both Online + Offline stores |
Category 10: Transfer Learning vs Incremental
| Signal Words | Answer |
|---|---|
| “limited data”, “pre-trained”, “new task”, “similar domain” | Transfer Learning |
| “new data arrives”, “without retraining”, “update existing model” | Incremental Learning |
| “distribution shifted significantly” | Full Retrain (old + new data) |
Top 10 Exam Traps
1. Over-Engineering
Weekly batch → answer offers Real-Time Endpoint. Pick Batch Transform.
2. “Operational Overhead” ≠ “Cost”
“Least overhead” = most managed/serverless. “Most cost-effective” = cheapest total. May be different answers.
3. Missing Configuration
Spot without checkpoint = loses progress. Monitor without Data Capture = no data. Model Quality without ground truth = can’t measure.
4. Service Confusion
- Processing ≠ Training
- Data Quality ≠ Model Quality
- Clarify ≠ Model Monitor
- Pipelines ≠ Step Functions
- Autopilot ≠ Automatic Model Tuning
5. XGBoost Doesn’t Need GPU
“Train XGBoost, choose instance” → ml.m5 (CPU). GPU for deep learning only.
6. Lambda Limits
Process 2GB file? Lambda can’t (250MB limit). Use ECS/Fargate/SageMaker Processing.
7. Accuracy Is Misleading
99.5% accuracy on 99.5/0.5% imbalanced data = useless. Use F1, AUC-ROC, Precision-Recall.
8. AdaBoost on Noisy/Imbalanced Data
AdaBoost upweights noisy misclassified samples → overfits. No regularization. XGBoost = correct (scale_pos_weight + L1/L2 + missing value handling).
9. Random K-Fold on Time Series
Trains on future, tests on past = leakage. Use walk-forward (time-series split).
10. Bedrock vs SageMaker
GenAI apps (chat, RAG, content) → Bedrock. Custom model training/deployment → SageMaker.
Augmented Manifest (Ground Truth Output)
{"source-ref": "s3://bucket/image1.jpg", "label": 0, "label-metadata": {"confidence": 0.95, "human-annotated": "yes"}}
{"source-ref": "s3://bucket/image2.jpg", "label": 1, "label-metadata": {"confidence": 0.88, "human-annotated": "no"}}
- JSON Lines format — directly consumable by SageMaker training
- “human-annotated”: “no” = auto-labeled by active learning