Lab: Domain 2 — Model Development Hands-On
Lab: Domain 2 — Model Development Hands-On
These labs cover the full model development lifecycle: training, hyperparameter tuning, AutoML, bias detection, and working with foundation models on Bedrock.
Lab 1: Train an XGBoost Model on SageMaker
What Is This Lab About?
XGBoost is the most tested algorithm on the MLA-C01 exam. It’s the default choice for tabular/structured data (CSVs, database exports, feature tables). This lab walks through a complete training job — from data upload to model artifact — so you understand every component the exam asks about.
What You’ll Build
┌─────────────────────────────────────────────────────────────┐
│ SageMaker Training Job │
│ │
│ YOU PROVIDE: SAGEMAKER MANAGES: │
│ ┌──────────────┐ ┌────────────────────────┐ │
│ │ Training data │──(S3)──────▶│ 1. Provision ml.m5 │ │
│ │ (CSV) │ │ 2. Pull XGBoost │ │
│ └──────────────┘ │ container from ECR │ │
│ ┌──────────────┐ │ 3. Download data from │ │
│ │ Hyperparams │─────────────▶│ S3 to /opt/ml/ │ │
│ │ (max_depth, │ │ 4. Run training │ │
│ │ eta, etc.) │ │ 5. Upload model.tar.gz │ │
│ └──────────────┘ │ back to S3 │ │
│ ┌──────────────┐ │ 6. Terminate instance │ │
│ │ Instance type │ │ (you stop paying) │ │
│ │ (ml.m5.xl) │ └────────────────────────┘ │
│ └──────────────┘ │
│ │
│ CONTAINER DIRECTORY: │
│ /opt/ml/ │
│ ├── input/ │
│ │ ├── config/hyperparameters.json ← your settings │
│ │ └── data/ │
│ │ ├── train/train.csv ← training data │
│ │ └── validation/val.csv ← validation data │
│ ├── model/ ← model saved here │
│ └── output/failure ← errors go here │
└─────────────────────────────────────────────────────────────┘
Why This Matters for the Exam
Questions about training jobs test:
- Instance type selection — CPU (ml.m5) for XGBoost, GPU (ml.p3) for deep learning
- Input modes — File vs Pipe vs FastFile
- Spot training — up to 90% savings with checkpointing
- Container paths — where data and models live inside the container
- Hyperparameters — what each one does, especially
scale_pos_weightfor imbalanced data

What’s ACTUALLY Happening When You Call .fit()?
A training job isn’t a function call — it’s an entire infrastructure lifecycle:
1. PROVISIONING (1-5 minutes)
AWS spins up EC2 instances of the type you requested (ml.m5.xlarge).
These are real servers in an AWS data center — with CPU, memory, disk.
You're paying from THIS MOMENT, not from when training starts.
2. CONTAINER PULL (30-60 seconds)
The XGBoost Docker container image is pulled from Amazon ECR
(Elastic Container Registry). This image contains the XGBoost
algorithm code, Python runtime, and all dependencies.
→ This is why BYOC (Bring Your Own Container) pushes to ECR.
3. DATA DOWNLOAD (seconds to hours, depends on dataset size)
File Mode: S3 → download ENTIRE dataset to instance EBS disk
Fast reads but slow start. Needs enough disk.
Pipe Mode: S3 → stream as Linux FIFO pipe. No disk needed.
Only works with RecordIO format. Sequential only.
FastFile Mode: S3 → POSIX mount. Loads pages on demand.
Any format, random access, no full download.
4. TRAINING (minutes to days)
The container runs your algorithm. Metrics stream to CloudWatch
in real-time. If Debugger is enabled, tensor values are captured.
Checkpoints (if configured) are saved to S3 periodically.
5. MODEL UPLOAD (seconds)
The trained model artifact (/opt/ml/model/) is tar-gzipped
and uploaded to your specified S3 output path.
6. CLEANUP
EC2 instances terminated. EBS volumes deleted. You stop paying.
Total cost = instance-hours from step 1 through step 6 (not just step 4!)
Why this matters: “Training takes 30 min but costs more than expected” → provisioning + data download may add 10+ minutes. Use Warm Pools to skip provisioning on repeated jobs. Use Pipe/FastFile mode to skip data download.
Step 1: Setup
import sagemaker
import boto3
from sagemaker import get_execution_role
from sagemaker.inputs import TrainingInput
role = get_execution_role()
session = sagemaker.Session()
region = session.boto_region_name
bucket = session.default_bucket()
prefix = "xgboost-lab"
Step 2: Upload Training Data
SageMaker expects your data in S3. For XGBoost, the first column should be the label, remaining columns are features, no header.
train_path = session.upload_data("train.csv", bucket=bucket, key_prefix=f"{prefix}/train")
val_path = session.upload_data("validation.csv", bucket=bucket, key_prefix=f"{prefix}/validation")
Step 3: Configure the Estimator
This is where you make the key decisions the exam tests.
from sagemaker.estimator import Estimator
# Get the official XGBoost container image
image_uri = sagemaker.image_uris.retrieve(
framework="xgboost", region=region, version="1.5-1", py_version="py3",
)
xgb = Estimator(
image_uri=image_uri,
role=role,
instance_count=1,
instance_type="ml.m5.xlarge", # CPU — XGBoost doesn't benefit from GPU
output_path=f"s3://{bucket}/{prefix}/output",
sagemaker_session=session,
# COST SAVING: Use Spot Instances
use_spot_instances=True, # up to 90% cheaper
max_run=3600, # max training time: 1 hour
max_wait=7200, # max wait for spot: 2 hours (MUST > max_run)
checkpoint_s3_uri=f"s3://{bucket}/{prefix}/checkpoints", # REQUIRED for spot
)
Why ml.m5.xlarge and not a GPU? XGBoost builds decision trees — tree construction is inherently sequential and doesn’t parallelize well on GPUs. Using ml.p3 (GPU) for XGBoost wastes money. This is a common exam trap.
Spot training requires ALL THREE: use_spot_instances=True + max_wait > max_run + checkpoint_s3_uri. Miss any one and the job either fails or loses progress on interruption.
Step 4: Set Hyperparameters
Each parameter controls a different aspect of how XGBoost learns. Understanding these is heavily tested.
xgb.set_hyperparameters(
objective="binary:logistic", # binary classification (outputs probability)
num_round=100, # 100 boosting rounds (trees built sequentially)
max_depth=5, # each tree can be 5 levels deep
# higher → more complex, risk overfitting
# lower → simpler, may underfit
eta=0.2, # learning rate (shrinks each tree's contribution)
# higher → faster training, risk overfitting
# lower → needs more rounds, better generalization
gamma=4, # minimum loss reduction to make a split
# higher → fewer splits, simpler trees (regularization)
min_child_weight=6, # minimum samples in a leaf node
# higher → more conservative (regularization)
subsample=0.7, # use 70% of data for each tree (regularization)
colsample_bytree=0.7, # use 70% of features for each tree (regularization)
eval_metric="auc", # evaluate using AUC-ROC
scale_pos_weight=10, # for IMBALANCED data: ratio of neg/pos samples
# if 1% fraud → scale_pos_weight ≈ 99
)
Exam scenario: “Fraud detection model, 1% positive class, high accuracy but misses most fraud” → Set
scale_pos_weightto the neg/pos ratio AND use AUC/F1 instead of accuracy.
Step 5: Train
xgb.fit(
inputs={
"train": TrainingInput(s3_data=train_path, content_type="text/csv"),
"validation": TrainingInput(s3_data=val_path, content_type="text/csv"),
},
wait=True,
logs="All",
)
print(f"Model artifact: {xgb.model_data}")
# → s3://bucket/xgboost-lab/output/xgboost-2026-04-27-.../output/model.tar.gz
Input Modes — How Data Gets to the Container
┌──────────────┐ Simplest. Downloads entire dataset to disk before
│ FILE MODE │ training starts. Needs enough EBS storage.
│ (default) │ Works with any format (CSV, Parquet, images).
└──────────────┘
┌──────────────┐ Fastest. Streams data from S3 as a pipe — no disk
│ PIPE MODE │ needed. BUT requires RecordIO-Protobuf format and
│ │ sequential reading only (no random access).
└──────────────┘
┌──────────────┐ Best of both. POSIX file mount that loads pages
│ FASTFILE MODE│ from S3 on demand. Any format, random access,
│ (modern) │ no full download. DEFAULT for new SageMaker versions.
└──────────────┘
Exam decision:
"Training fails — out of disk space" → Switch to Pipe or FastFile mode
"Fastest possible data ingestion" → Pipe mode + RecordIO
"Works with any format, no disk issue" → FastFile mode
Lab 2: Hyperparameter Tuning (HPO)
What Is This Lab About?
You’ve trained an XGBoost model, but how do you know max_depth=5 and eta=0.2 are the best values? You don’t — you guessed. Hyperparameter Tuning systematically searches the parameter space to find the combination that maximizes your metric.
What You’ll Build
┌─────────────────────────────────────────────────────────┐
│ SageMaker Hyperparameter Tuning │
│ │
│ You define: │
│ • Ranges: max_depth=[3-10], eta=[0.01-0.3], etc. │
│ • Objective: maximize validation:auc │
│ • Budget: 20 trials, 4 parallel │
│ • Strategy: Bayesian optimization │
│ │
│ SageMaker runs 20 training jobs: │
│ │
│ Trial 1: depth=7, eta=0.15 → AUC=0.82 │
│ Trial 2: depth=4, eta=0.28 → AUC=0.79 │
│ Trial 3: depth=5, eta=0.08 → AUC=0.86 ← Bayesian │
│ Trial 4: depth=6, eta=0.10 → AUC=0.88 learns │
│ ... from each │
│ Trial 20: depth=5, eta=0.09 → AUC=0.91 trial │
│ │
│ Result: Best hyperparameters found! │
│ Deploy best model directly. │
└─────────────────────────────────────────────────────────┘
Tuning Strategies Compared
RANDOM SEARCH BAYESIAN (default) HYPERBAND
● ● ● ● ●●●●●●●●●● Start many
● ● ● ● ● → ● ●●●●● Kill bad early
● ● ● ● → ● → ● ●●● Keep promising
● ● ● ● ● → ● → ● → ★ ●★ Best survives
Picks randomly. Each trial informs Runs many cheap
Good for initial the next. Most trials, allocates
exploration. efficient for <100 resources to winners.
trials.
Step 1: Define Ranges
from sagemaker.tuner import IntegerParameter, ContinuousParameter, HyperparameterTuner
hyperparameter_ranges = {
"max_depth": IntegerParameter(3, 10),
"eta": ContinuousParameter(0.01, 0.3),
"min_child_weight": IntegerParameter(1, 10),
"subsample": ContinuousParameter(0.5, 1.0),
"colsample_bytree": ContinuousParameter(0.5, 1.0),
"num_round": IntegerParameter(50, 300),
}
Step 2: Create and Run Tuner
tuner = HyperparameterTuner(
estimator=xgb, # reuse your estimator from Lab 1
objective_metric_name="validation:auc", # what to optimize
objective_type="Maximize", # higher AUC = better
hyperparameter_ranges=hyperparameter_ranges,
max_jobs=20, # total trials
max_parallel_jobs=4, # run 4 at a time
strategy="Bayesian", # most efficient strategy
early_stopping_type="Auto", # stop bad trials early (saves cost)
)
tuner.fit(
inputs={
"train": TrainingInput(s3_data=train_path, content_type="text/csv"),
"validation": TrainingInput(s3_data=val_path, content_type="text/csv"),
},
wait=False, # don't block — tuning takes a while
)
Step 3: Analyze Results
# Wait for completion
tuner.wait()
# View leaderboard
analytics = sagemaker.HyperparameterTuningJobAnalytics(tuner.latest_tuning_job.job_name)
results = analytics.dataframe().sort_values("FinalObjectiveValue", ascending=False)
print(results[["TrainingJobName", "FinalObjectiveValue",
"max_depth", "eta", "subsample"]].head(5))
Step 4: Deploy Best Model Directly
predictor = tuner.deploy(
initial_instance_count=1,
instance_type="ml.m5.large",
)
result = predictor.predict("25,50000,680,12,0.7,0.3,0.1")
Warm Start — Resume Previous Tuning
If you ran 20 trials and want 20 more, don’t start from scratch. Warm start transfers the knowledge from previous trials.
from sagemaker.tuner import WarmStartConfig, WarmStartTypes
warm_config = WarmStartConfig(
warm_start_type=WarmStartTypes.IDENTICAL_DATA_AND_ALGORITHM, # same problem
parents={tuner.latest_tuning_job.job_name}, # learn from this
)
tuner_v2 = HyperparameterTuner(
estimator=xgb,
objective_metric_name="validation:auc",
objective_type="Maximize",
hyperparameter_ranges=hyperparameter_ranges,
max_jobs=20,
max_parallel_jobs=4,
warm_start_config=warm_config, # starts where v1 left off
)
| Warm Start Type | When to Use |
|---|---|
IDENTICAL_DATA_AND_ALGORITHM | Same data, same algorithm — transfer ALL knowledge |
TRANSFER_LEARNING | Similar but different problem — transfer PARTIAL knowledge |
Lab 3: SageMaker Clarify — Bias Detection & Explainability
What Is This Lab About?
Your loan approval model has 95% accuracy. But is it fair? Does it approve men at a higher rate than women? Does it rely too heavily on zip code (a proxy for race)?
SageMaker Clarify answers these questions:
- Pre-training bias: Is the dataset itself biased?
- Post-training bias: Are the model’s predictions biased?
- SHAP explainability: Why did the model make each specific prediction?
What You’ll Build
┌─────────────────────────────────────────────────────────────┐
│ SageMaker Clarify │
│ │
│ PRE-TRAINING (before training): │
│ ┌──────────────────────────────────────────────────┐ │
│ │ "Is the training DATA biased?" │ │
│ │ │ │
│ │ Dataset: 10,000 loan applications │ │
│ │ Check: gender column │ │
│ │ │ │
│ │ Finding: 70% male applicants, 30% female │ │
│ │ Finding: Approval rate male=80%, female=60% │ │
│ │ Metric: Class Imbalance = 0.4 (significant) │ │
│ │ Metric: DPL = 0.2 (20% approval gap) │ │
│ └──────────────────────────────────────────────────┘ │
│ │
│ POST-TRAINING (after training): │
│ ┌──────────────────────────────────────────────────┐ │
│ │ "Are the model's PREDICTIONS biased?" │ │
│ │ │ │
│ │ Model predicts: male approval = 85% │ │
│ │ Model predicts: female approval = 55% │ │
│ │ Metric: Disparate Impact = 0.65 (<0.8 = bias!) │ │
│ │ The model AMPLIFIED the data bias. │ │
│ └──────────────────────────────────────────────────┘ │
│ │
│ EXPLAINABILITY (per prediction): │
│ ┌──────────────────────────────────────────────────┐ │
│ │ "WHY was applicant #1234 denied?" │ │
│ │ │ │
│ │ Base prediction: 0.50 (50% approval chance) │ │
│ │ + Income: $120K → +0.20 │ │
│ │ + Credit score: 780 → +0.15 │ │
│ │ - Debt ratio: 0.8 → -0.35 ← main reason │ │
│ │ - Employment: 6 months → -0.10 │ │
│ │ = Final: 0.40 (denied) │ │
│ │ │ │
│ │ SHAP tells you: "Denied mainly due to high debt" │ │
│ └──────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Step 1: Run Pre-Training Bias Analysis
from sagemaker.clarify import (
SageMakerClarifyProcessor, DataConfig, BiasConfig,
)
clarify = SageMakerClarifyProcessor(
role=role, instance_count=1, instance_type="ml.c5.xlarge",
sagemaker_session=session,
)
data_config = DataConfig(
s3_data_input_path=f"s3://{bucket}/data/loan_applications.csv",
s3_output_path=f"s3://{bucket}/clarify/pre-training-report",
label="approved", # target column
headers=["age", "gender", "income", "credit_score", "debt_ratio", "approved"],
dataset_type="text/csv",
)
bias_config = BiasConfig(
label_values_or_threshold=[1], # what counts as "positive outcome" (approved)
facet_name="gender", # sensitive attribute to check
facet_values_or_threshold=[0], # reference group (0=male in this encoding)
)
clarify.run_pre_training_bias(
data_config=data_config,
data_bias_config=bias_config,
wait=True,
)
Output report includes these metrics:
| Metric | What It Measures | Flag If |
|---|---|---|
| Class Imbalance (CI) | Difference in group sizes | >0.3 |
| DPL | Difference in approval rates between groups | >0.1 |
| KL Divergence | How different the feature distributions are | >0.1 |
| Jensen-Shannon | Symmetric version of KL (bounded 0-1) | >0.1 |
Step 2: Run Post-Training Bias + SHAP
After training your model, check if its predictions are fair AND get explanations.
from sagemaker.clarify import ModelConfig, SHAPConfig
model_config = ModelConfig(
model_name="loan-approval-model",
instance_type="ml.m5.xlarge",
instance_count=1,
content_type="text/csv",
)
# SHAP needs a "baseline" — typically the dataset median
# It measures how each feature SHIFTS the prediction away from this baseline
shap_config = SHAPConfig(
baseline=[[35, 0, 60000, 700, 0.4]], # median values for each feature
num_samples=500, # more = more accurate, slower
agg_method="mean_abs", # global importance aggregation
save_local_shap_values=True, # save per-prediction explanations
)
clarify.run_explainability(
data_config=data_config,
model_config=model_config,
explainability_config=shap_config,
wait=True,
)
How to Read SHAP Output
Global feature importance (mean |SHAP|):
credit_score: 0.28 ████████████████████████████
income: 0.22 ██████████████████████
debt_ratio: 0.19 ███████████████████
age: 0.08 ████████
gender: 0.03 ███
→ Credit score matters most. Gender matters least (good — low bias risk).
Local explanation (applicant #1234):
Base: 0.50
credit_score: +0.15 (780 is high → helps approval)
income: +0.20 ($120K is high → helps)
debt_ratio: -0.35 (0.8 is very high → hurts approval most)
age: +0.02
gender: -0.02
Final: 0.40 → Denied
→ "Denied primarily because of high debt-to-income ratio"
Lab 4: Amazon Bedrock — Foundation Models & RAG

What Is This Lab About?
Not every ML problem needs custom training. If you need a chatbot, a document Q&A system, or content generation, Amazon Bedrock provides access to foundation models (Claude, Llama, Titan, etc.) through a simple API — no infrastructure, no training, no GPUs.
This lab covers the three most exam-relevant Bedrock patterns: text generation, RAG (retrieval-augmented generation), and guardrails.
What You’ll Build
PATTERN 1: Direct Inference
┌──────────┐ ┌─────────────┐ ┌───────────────┐
│ Your │────▶│ Bedrock │────▶│ Claude/Llama │────▶ Answer
│ Prompt │ │ Runtime │ │ Foundation │
│ │ │ API │ │ Model │
└──────────┘ └─────────────┘ └───────────────┘
PATTERN 2: RAG (Retrieval-Augmented Generation)
┌──────────┐ ┌─────────────────────────────────────────────────┐
│ "What is │ │ Bedrock Knowledge Base │
│ our │────▶│ │
│ refund │ │ 1. Embed question → vector │
│ policy?" │ │ 2. Search vector store for similar chunks │
│ │ │ 3. Retrieve top 5 relevant paragraphs │
│ │ │ 4. Inject into prompt: "Based on these docs..." │
│ │ │ 5. FM generates grounded answer │
│ │ │ 6. Return answer + source citations │
└──────────┘ └─────────────────────────────────────────────────┘
↕ ↕
┌──────────┐ ┌──────────────────┐
│ S3 │ │ OpenSearch │
│ (your │─────▶│ Serverless │
│ docs) │ │ (vector store) │
└──────────┘ └──────────────────┘
PATTERN 3: Guardrails
┌──────────┐ ┌─────────────┐ ┌──────────┐ ┌─────────────┐
│ User │────▶│ INPUT │────▶│ FM │────▶│ OUTPUT │────▶ Safe
│ message │ │ GUARDRAIL │ │ (Claude) │ │ GUARDRAIL │ response
│ │ │ Block hate │ │ │ │ Redact PII │
│ │ │ Block PII │ │ │ │ Block harm │
└──────────┘ └─────────────┘ └──────────┘ └─────────────┘
Step 1: Direct Text Generation
import boto3
import json
bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")
response = bedrock.invoke_model(
modelId="anthropic.claude-3-sonnet-20240229-v1:0",
contentType="application/json",
accept="application/json",
body=json.dumps({
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 1024,
"temperature": 0.7, # 0=deterministic, 1=creative
"messages": [
{"role": "user", "content": "Explain gradient boosting in 3 sentences."}
],
}),
)
result = json.loads(response["body"].read())
print(result["content"][0]["text"])
Temperature controls randomness:
| Temperature | Behavior | Use Case |
|---|---|---|
| 0.0 | Always same answer (deterministic) | Factual Q&A, classification |
| 0.3-0.5 | Mostly consistent, slight variation | Summarization, analysis |
| 0.7-1.0 | Creative, diverse outputs | Content generation, brainstorming |
Step 2: Generate Embeddings (For RAG)
Before you can search documents semantically, you need to convert text into vectors (embeddings). Similar meanings produce similar vectors.
response = bedrock.invoke_model(
modelId="amazon.titan-embed-text-v2:0",
contentType="application/json",
accept="application/json",
body=json.dumps({
"inputText": "What is the company refund policy?",
"dimensions": 256,
}),
)
result = json.loads(response["body"].read())
embedding = result["embedding"]
print(f"Vector with {len(embedding)} dimensions")
# This vector can be stored in OpenSearch / pgvector / Pinecone
# and searched by similarity with other vectors
Step 3: Query a Knowledge Base (RAG)
After setting up a Knowledge Base (S3 docs → chunked → embedded → indexed in OpenSearch), you can query it. The Knowledge Base automatically retrieves relevant chunks and augments the prompt.
bedrock_agent = boto3.client("bedrock-agent-runtime", region_name="us-east-1")
response = bedrock_agent.retrieve_and_generate(
input={"text": "What is our refund policy for software licenses?"},
retrieveAndGenerateConfiguration={
"type": "KNOWLEDGE_BASE",
"knowledgeBaseConfiguration": {
"knowledgeBaseId": "YOUR_KB_ID",
"modelArn": "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0",
},
},
)
# The answer is grounded in YOUR documents
print("Answer:", response["output"]["text"])
# See which documents were used (citations)
for citation in response.get("citations", []):
for ref in citation.get("retrievedReferences", []):
print(f" Source: {ref['location']['s3Location']['uri']}")
print(f" Chunk: {ref['content']['text'][:100]}...")
When RAG vs Fine-Tuning?
"Model needs to KNOW new facts"
→ Has my company's specific policies/data?
YES → RAG (retrieve docs at query time)
"Model needs to BEHAVE differently"
→ Should it respond in legal language? Medical format?
YES → Fine-tuning (change model weights)
"Model generates wrong/harmful content"
→ Block topics, filter PII, enforce safety?
YES → Guardrails (applied at input/output)
| Approach | Changes | Data Needed | Cost | Update Speed |
|---|---|---|---|---|
| RAG | Nothing in model | Documents in S3 | $ (inference only) | Instant (update docs) |
| Fine-tuning | Model weights | Labeled examples (JSONL) | $$$ (training compute) | Hours (retrain) |
| Prompt engineering | Nothing | Zero examples | $ | Instant |
Domain 2 Lab Summary
| Lab | Service | You Learned |
|---|---|---|
| 1 | XGBoost Training | Instance types, hyperparameters, spot training, input modes, container paths |
| 2 | HPO | Bayesian vs Random vs Hyperband, ranges, warm start, early stopping |
| 3 | Clarify | Pre/post-training bias metrics, SHAP explainability, reading SHAP output |
| 4 | Bedrock | Text generation, embeddings, RAG with Knowledge Bases, RAG vs fine-tuning |