MLA-C01 Exam Cheat Sheet#
Rapid-fire reference for the exam. Print this and review the night before.
The Golden Rules#
- SageMaker is almost always the answer for ML workflow questions
- Prefer managed services over custom solutions
- Bedrock for GenAI, SageMaker for custom ML
- Least privilege for every IAM question
- VPC Endpoints when “data must not traverse the internet”
- KMS for every encryption question
- XGBoost is the default for tabular/structured data
- RecordIO + Pipe mode for fastest training ingestion
Service → Purpose (One-Line Each)#
Storage & Ingestion#
| Service | One-Liner |
|---|
| S3 | The data lake — all ML data lives here |
| EFS | Shared file system across instances |
| FSx for Lustre | High-throughput training I/O, backed by S3 |
| Kinesis Data Streams | Real-time streaming with custom consumers |
| Data Firehose | Managed delivery to S3/Redshift (near real-time) |
| MSK | Managed Kafka |
| Apache Flink | Real-time stream processing with SQL |
| Service | One-Liner |
|---|
| EMR + Spark | Large-scale distributed data processing |
| Glue ETL | Serverless ETL jobs (PySpark) |
| Glue Crawlers | Auto-discover schemas → Data Catalog |
| Glue DataBrew | Visual data prep + PII handling |
| Data Wrangler | Visual data prep inside SageMaker Studio |
| Ground Truth | Managed data labeling (humans + auto-label) |
| Feature Store | Centralized features (online + offline) |
| Athena | SQL queries on S3 (serverless) |
ML Development#
| Service | One-Liner |
|---|
| SageMaker Studio | Unified IDE for all ML tasks |
| SageMaker Canvas | No-code ML for business users |
| SageMaker Autopilot | AutoML — automated algorithm + tuning selection |
| SageMaker Clarify | Bias detection + SHAP explainability |
| SageMaker Debugger | Real-time training monitoring + profiling |
| SageMaker Experiments | Track/compare training runs |
| Model Registry | Version control + approval workflows for models |
| JumpStart | Pre-trained model hub (400+ models) |
GenAI / Bedrock#
| Service | One-Liner |
|---|
| Bedrock | Managed API access to foundation models |
| Bedrock Knowledge Bases | Managed RAG (S3 → chunk → embed → vector store) |
| Bedrock Agents | Multi-step LLM tasks with Lambda actions |
| Bedrock Guardrails | Content filters + PII + topic blocking |
Managed AI Services#
| Service | One-Liner |
|---|
| Comprehend | NLP: sentiment, entities, PII, topics |
| Translate | Neural machine translation |
| Transcribe | Speech-to-text |
| Polly | Text-to-speech |
| Rekognition | Image/video analysis + custom labels |
| Textract | OCR for documents, forms, tables |
| Lex | Build chatbots (intents + slots + Lambda) |
| Personalize | Real-time recommendation engine |
| Kendra | Enterprise semantic search |
| A2I | Human review workflows for ML output |
| Q Business | Enterprise AI assistant (40+ data connectors) |
| Q Developer | AI coding assistant in IDE |
| Macie | Discover PII in S3 |
| Fraud Detector | Managed fraud detection |
Deployment & Ops#
| Service | One-Liner |
|---|
| SageMaker Pipelines | ML-native CI/CD workflows |
| CodePipeline | General CI/CD orchestration |
| CodeBuild | Build + test step |
| CodeDeploy | Deploy to compute targets |
| Step Functions | State machine workflow orchestration |
| EventBridge | Event-driven automation (triggers) |
| MWAA | Managed Apache Airflow (DAGs) |
| CloudFormation | Infrastructure as YAML/JSON |
| CDK | Infrastructure as code (Python/TS) |
| ECR | Docker container registry |
| ECS + Fargate | Run containers serverlessly |
| EKS | Managed Kubernetes |
| Neo | Compile model for edge hardware |
| IoT Greengrass | Edge ML inference |
Monitoring & Security#
| Service | One-Liner |
|---|
| Model Monitor | Production drift detection (4 types) |
| CloudWatch | Metrics, alarms, logs, dashboards |
| CloudTrail | Audit log of all API calls |
| IAM | Users, roles, policies, permissions |
| KMS | Encryption key management |
| Secrets Manager | Store/rotate credentials |
| WAF | Web application firewall |
| Shield | DDoS protection |
| Lake Formation | Data lake access governance |
| VPC Endpoints | Keep traffic within AWS network |
Algorithm Quick-Match#
| I Have… | I Want To… | Use |
|---|
| Tabular data | Classify / Predict | XGBoost |
| Simple linear data | Classify / Predict | Linear Learner |
| Multiple time series | Forecast future values | DeepAR |
| Text | Classify sentiment/topic | BlazingText |
| Text corpus | Find topics | NTM |
| Images | Classify | Image Classification |
| Images | Find + locate objects | Object Detection |
| Images | Label every pixel | Semantic Segmentation |
| Numeric data | Find anomalies | Random Cut Forest |
| IP addresses | Detect suspicious usage | IP Insights |
| Sparse click/purchase data | Recommend items | Factorization Machines |
| High dimensions | Reduce features | PCA |
| Unlabeled data | Group into clusters | K-Means |
| Sequences | Translate / Summarize | Seq2Seq |
Endpoint Types: Which One?#
| Signal in Question | Answer |
|---|
| “Low latency”, “real-time API”, “interactive” | Real-Time Endpoint |
| “Intermittent traffic”, “minimize cost”, “pay per use” | Serverless Inference |
| “Large payload”, “minutes to process”, “>6MB” | Async Inference |
| “Score entire dataset”, “nightly batch”, “no endpoint needed” | Batch Transform |
| “A/B test two models” | Production Variants |
| “Many models, sporadic traffic” | Multi-Model Endpoint |
| “Chain preprocessing + prediction” | Inference Pipeline |
Deployment: Which Strategy?#
| Signal in Question | Answer |
|---|
| “Instant rollback”, “zero downtime swap” | Blue/Green |
| “Test on small % first”, “gradual rollout” | Canary |
| “Shift traffic linearly over time” | Linear |
| “Compare models with zero user risk” | Shadow Testing |
| Signal in Question | Answer |
|---|
| “Fastest SageMaker training” | RecordIO + Pipe Mode |
| “Query with Athena”, “analytics” | Parquet (columnar) |
| “Schema may evolve”, “streaming” | Avro |
| “Hive / EMR workloads” | ORC |
| “Simple, human-readable” | CSV or JSON |
Metric Selection#
| Scenario | Metric |
|---|
| “Don’t miss cancer” (minimize FN) | Recall |
| “Don’t flag good email as spam” (minimize FP) | Precision |
| “Balance precision and recall” | F1 Score |
| “Overall classifier quality” | AUC-ROC |
| “Regression — same units as target” | RMSE |
| “Regression — % variance explained” | R² |
| “Cluster quality” | Silhouette Score |
Bias & Explainability#
| When | Tool |
|---|
| Before training (data bias) | Clarify — pre-training bias metrics |
| After training (model bias) | Clarify — post-training bias metrics |
| In production (drift) | Model Monitor + Clarify |
| Explain individual predictions | SHAP values |
| Feature effect on overall model | Partial Dependence Plots |
| Balance imbalanced datasets | Data Wrangler — SMOTE / resampling |
Security Patterns#
| Question Pattern | Answer |
|---|
| “Encrypt data at rest” | SSE-KMS (S3) / KMS (EBS, SageMaker) |
| “Encrypt in transit” | TLS/SSL (automatic for endpoints) |
| “Data must not leave AWS network” | VPC Endpoints (PrivateLink) |
| “Isolate training from internet” | VPC private subnet + network isolation |
| “Who accessed what resource” | CloudTrail |
| “Find PII in S3” | Macie |
| “Handle PII in ETL” | Glue DataBrew |
| “Rotate database passwords” | Secrets Manager |
| “Fine-grained data lake permissions” | Lake Formation |
| “Least-privilege ML roles” | SageMaker Role Manager |
| “Protect API from DDoS” | Shield + WAF |
Cost Optimization Patterns#
| Question Pattern | Answer |
|---|
| “Reduce training costs” | Spot Instances + Checkpointing |
| “Reduce inference costs (variable traffic)” | Serverless Inference or Auto-Scaling |
| “Reduce inference costs (steady traffic)” | Savings Plans or Inferentia instances |
| “Many models, low traffic each” | Multi-Model Endpoints |
| “Right-size inference instances” | Inference Recommender |
| “Reduce DL training costs” | AWS Trainium (ml.trn1) |
| “Reduce DL inference costs” | AWS Inferentia (ml.inf2) |
| “Optimize model for target hardware” | SageMaker Neo |
| “Reduce data scanned in queries” | Parquet + Partitioning (Athena) |
Automation Patterns#
| Trigger | → | Action | Service Chain |
|---|
| New data in S3 | → | Start retraining | S3 Event → EventBridge → SageMaker Pipeline |
| Model drift detected | → | Alert + retrain | Model Monitor → CloudWatch → EventBridge → Pipeline |
| Model approved | → | Deploy to endpoint | Model Registry → Pipeline → Endpoint |
| Schedule (daily/weekly) | → | Run batch predictions | EventBridge Schedule → Batch Transform |
| Code pushed | → | Build + deploy | CodeCommit → CodePipeline → CodeBuild → Deploy |
Common Trap Questions#
| Trap | Correct Answer |
|---|
| “Use EMR for simple ETL” | Glue (EMR is for large-scale distributed) |
| “Build custom recommendation engine” | Personalize (managed service first) |
| “Train a chatbot from scratch” | Lex (managed) or Bedrock (LLM-based) |
| “Store features in S3 manually” | Feature Store (built for this) |
| “Use EC2 for inference” | SageMaker Endpoint (managed, auto-scaling) |
| “Use Lambda for ML inference” | Only if model is tiny (<10GB, <15min) — otherwise SageMaker |
| “Monitor model with custom scripts” | Model Monitor (built-in, integrates with Clarify) |
| “CloudFormation OR CDK?” | CDK if code-based infra needed, but both valid — CDK compiles to CFN |
| “Kinesis OR Firehose?” | Kinesis = real-time custom processing; Firehose = managed delivery |
| “ECS OR EKS?” | ECS = simpler, AWS-native; EKS = Kubernetes, portable |
Numbers to Remember#
| Fact | Value |
|---|
| Exam questions | 65 (50 scored + 15 unscored) |
| Exam time | 170 minutes |
| Passing score | 720 / 1000 |
| S3 PUT throughput per prefix | 3,500 req/s |
| S3 GET throughput per prefix | 5,500 req/s |
| Kinesis shard write | 1 MB/s or 1,000 records/s |
| Kinesis shard read | 2 MB/s (shared) or 2 MB/s per consumer (enhanced) |
| SageMaker real-time payload limit | 6 MB |
| SageMaker async payload limit | 1 GB |
| SageMaker serverless payload limit | 4 MB |
| Rekognition Custom Labels min images | 10 per label |
| Firehose min buffer | 60 seconds |
| Spot Instance savings | Up to 90% |
| SageMaker Savings Plans | Up to 64% |
| Inferentia inference savings | Up to 70% |
| Trainium training savings | Up to 50% |
| Ground Truth auto-labeling savings | Up to 70% |