Lab: Domain 3 — Deployment & Orchestration Hands-On
Table of Contents
- Lab: Domain 3 — Deployment & Orchestration Hands-On
- Lab 1: The Four Endpoint Types — Side by Side
- Lab 2: A/B Testing with Production Variants
- Lab 3: Auto-Scaling — Handle Traffic Spikes
- Lab 4: SageMaker Pipelines — Complete MLOps Workflow
- What Is This Lab About?
- What You’ll Build
- What’s ACTUALLY Running Behind a SageMaker Pipeline?
- Step 1: Define Pipeline Parameters
- Step 2: Processing Step (Data Prep)
- Step 3: Training Step
- Step 4: Evaluation Step
- Step 5: Condition + Register + Fail
- Step 6: Create and Run the Pipeline
- Step Caching — Avoid Reprocessing
- Trigger Pipeline Automatically
- Domain 3 Lab Summary
Lab: Domain 3 — Deployment & Orchestration Hands-On
These labs cover the critical “last mile” of ML — getting models into production and keeping them there reliably. This is 22% of the exam and heavily focuses on endpoint types, deployment strategies, SageMaker Pipelines, and auto-scaling.
Lab 1: The Four Endpoint Types — Side by Side
What Is This Lab About?
You’ve trained a model. Now what? SageMaker offers four fundamentally different ways to serve predictions. Choosing the wrong one is the most common mistake on the exam — and in real life.
This lab deploys the same model four different ways so you can see and feel the differences.
Decision Framework
"How should I serve this model?"
┌─── YES ── Consistent high traffic? ── YES ─── REAL-TIME
│ ENDPOINT
Do you need │
real-time │ Large payload (>6MB) ── YES ─── ASYNC
predictions? ───────┤ or long processing? INFERENCE
│
│ Sporadic traffic, ── YES ─── SERVERLESS
│ cost-sensitive? INFERENCE
│
└─── NO ──── Scheduled bulk scoring? ── YES ── BATCH
TRANSFORM
The Four Types at a Glance
┌─────────────────────────────────────────────────────────────────┐
│ │
│ REAL-TIME SERVERLESS ASYNC BATCH │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────┐│
│ │ Always │ │Scale to 0│ │ Queue + │ │ Job ││
│ │ running │ │ │ │ scale to │ │ runs ││
│ │ │ │ Cold │ │ 0 │ │ then ││
│ │ <100ms │ │ start │ │ │ │ dies ││
│ │ │ │ 1-5s │ │ 1GB max │ │ ││
│ │ 6MB max │ │ 4MB max │ │ 60min │ │ No ││
│ │ │ │ CPU only │ │ GPU OK │ │ limit││
│ │ Min 1 │ │ Min 0 │ │ Min 0 │ │ ││
│ │ instance │ │ │ │ │ │ ││
│ └──────────┘ └──────────┘ └──────────┘ └──────┘│
│ $$$$ (24/7) $ (per call) $$ (per use) $ (job) │
│ │
│ Bank fraud Internal tool Video/doc Weekly │
│ detection 5 req/day processing reports │
└─────────────────────────────────────────────────────────────────┘
What’s ACTUALLY Running Behind Each Endpoint Type?
These aren’t abstract services — they’re real infrastructure with real trade-offs:
Real-Time Endpoint — EC2 instances running Docker containers 24/7:
- When you call
deploy(), SageMaker provisions EC2 instances of the type you chose (ml.m5.large) - It pulls your model’s inference Docker container from ECR and loads
model.tar.gzfrom S3 into/opt/ml/model/ - A reverse proxy (nginx) listens on HTTPS and routes requests to your container’s
/invocationsendpoint - An internal health checker pings
/pingevery few seconds — if your container stops responding, the instance is replaced - You pay by the instance-hour, 24/7, whether requests come or not — this is why the minimum is 1 instance (can’t scale to 0)
- An internal load balancer distributes requests across instances when you have multiple
Serverless Inference — AWS manages containers that spin up/down:
- No EC2 instances are pre-provisioned — that’s the whole point
- When the first request arrives, SageMaker provisions a lightweight compute environment, pulls your container, loads the model — this is the cold start (1-5s)
- Subsequent requests reuse the warm container (fast, like real-time)
- After a period of inactivity (~5-15 min), the container is recycled — back to zero cost
- The memory you configure (1024-6144 MB) determines the compute power — there’s no explicit CPU selection
- No GPU option because the underlying infrastructure uses CPU-only container pools
Async Inference — SQS queue + auto-scaling compute:
- Behind the scenes, SageMaker creates an internal SQS queue for your endpoint
- When you call
invoke_endpoint_async(), your request is placed in this queue — the API returns immediately - Worker instances poll the queue, process requests, and write results to S3
- The scaling metric
ApproximateBacklogSizePerInstancetells auto-scaling how full the queue is - When the queue is empty, instances scale to 0 — you stop paying
- This is why async supports large payloads (1GB) — the data goes to S3 first, not through the API
Batch Transform — temporary Processing Job, no persistent endpoint:
- Batch Transform creates a temporary cluster of instances, runs inference on your entire S3 dataset, writes results to S3, then terminates
- There is no endpoint URL — nobody calls it in real-time
- It automatically splits your input data into mini-batches and distributes across instances
join_source="Input"is critical — without it, you get predictions with no way to match them back to input rows
Deploy 1: Real-Time Endpoint
Scenario: A bank needs sub-100ms fraud scoring on every credit card transaction, 24/7, thousands of requests per minute.
from sagemaker.serializers import CSVSerializer
from sagemaker.deserializers import JSONDeserializer
predictor = xgb.deploy(
initial_instance_count=1,
instance_type="ml.m5.large",
endpoint_name="fraud-realtime",
serializer=CSVSerializer(),
)
# Synchronous call — blocks until response (~50ms)
result = predictor.predict("25,50000,680,12,0.7,0.3,0.1")
print(f"Fraud probability: {result}")
Key characteristics:
- Always running, always paying (min 1 instance)
- Lowest latency (typically <100ms)
- Max payload: 6 MB
- Max response time: 60 seconds
- Cannot scale to 0 — this is a critical exam fact
Deploy 2: Serverless Inference
Scenario: A data science team runs a few ad-hoc predictions per day for internal analysis. They don’t want to pay for an always-on endpoint.
from sagemaker.serverless import ServerlessInferenceConfig
predictor = xgb.deploy(
serverless_inference_config=ServerlessInferenceConfig(
memory_size_in_mb=2048, # 1024-6144 MB (must fit model + overhead)
max_concurrency=10, # max parallel requests
),
endpoint_name="fraud-serverless",
serializer=CSVSerializer(),
)
# First call: 1-5 second cold start (provisioning container)
# After that: fast while container stays warm
result = predictor.predict("25,50000,680,12,0.7,0.3,0.1")
Key characteristics:
- Scales to ZERO when idle (no cost)
- Cold start: 1-5 seconds on first call
- CPU only — no GPU support
- Max payload: 4 MB
- Pay per inference-second
Deploy 3: Async Inference
Scenario: A company processes 500MB medical scan images through a classification model. Each takes 3-5 minutes. Traffic is irregular — sometimes 100 scans, sometimes none.
from sagemaker.async_inference import AsyncInferenceConfig
predictor = xgb.deploy(
initial_instance_count=1,
instance_type="ml.m5.xlarge",
async_inference_config=AsyncInferenceConfig(
output_path=f"s3://{bucket}/async-results/",
max_concurrent_invocations_per_instance=4,
notification_config={
"SuccessTopic": "arn:aws:sns:us-east-1:123456:async-success",
"ErrorTopic": "arn:aws:sns:us-east-1:123456:async-error",
},
),
endpoint_name="scan-async",
)
# Non-blocking — returns immediately with output location
response = session.sagemaker_runtime_client.invoke_endpoint_async(
EndpointName="scan-async",
InputLocation="s3://my-bucket/scans/patient_001.dcm", # up to 1 GB!
ContentType="application/octet-stream",
)
print(f"Results will appear at: {response['OutputLocation']}")
# Poll S3 for result, or wait for SNS notification
Key characteristics:
- Scales to ZERO when idle (like serverless)
- Supports GPU (unlike serverless)
- Max payload: 1 GB (uploaded to S3 first)
- Max processing time: 60 minutes
- Built-in queue — no SQS needed
- SNS notification on completion
Deploy 4: Batch Transform
Scenario: Every Sunday night, score 10 million customer records for a weekly marketing report. No one needs real-time results.
transformer = xgb.transformer(
instance_count=2, # use 2 instances for speed
instance_type="ml.m5.xlarge",
output_path=f"s3://{bucket}/batch-results/",
strategy="MultiRecord", # send multiple rows per request
max_concurrent_transforms=4, # 4 parallel invocations per instance
max_payload=6, # max MB per mini-batch
assemble_with="Line", # join output with newlines
)
transformer.transform(
data=f"s3://{bucket}/scoring-data/", # entire dataset on S3
content_type="text/csv",
split_type="Line",
join_source="Input", # include input in output (for matching)
wait=True,
)
print(f"10M predictions saved to: {transformer.output_path}")
Key characteristics:
- No persistent endpoint — spins up, runs, terminates
- Pay only for job duration
- No payload limit (processes entire S3 datasets)
- Use
join_source="Input"to match predictions back to input rows
Lab 2: A/B Testing with Production Variants
What Is This Lab About?
You’ve trained model v2 and it looks better in validation. But will it work in production? Production Variants let you split live traffic between two models and compare them side by side with real users — before committing to the switch.
What You’ll Build
Endpoint: "fraud-model"
│
┌─────────┴─────────┐
│ │
90% traffic 10% traffic
│ │
┌──────────┐ ┌──────────┐
│ Model v1 │ │ Model v2 │
│ (current) │ │ (new) │
│ │ │ │
│ ml.m5.lg │ │ ml.m5.lg │
└──────────┘ └──────────┘
Step 1: Deploy both, 90/10 split
Step 2: Monitor Model v2 metrics in CloudWatch
Step 3: If v2 is better, shift to 50/50, then 0/100
Step 4: If v2 is worse, shift back to 100/0 (rollback)
Step 1: Create Endpoint with Two Variants
sm = boto3.client("sagemaker")
sm.create_endpoint_config(
EndpointConfigName="ab-test-config",
ProductionVariants=[
{
"VariantName": "model-v1",
"ModelName": "fraud-model-v1",
"InstanceType": "ml.m5.large",
"InitialInstanceCount": 1,
"InitialVariantWeight": 0.9, # 90% of traffic
},
{
"VariantName": "model-v2",
"ModelName": "fraud-model-v2",
"InstanceType": "ml.m5.large",
"InitialInstanceCount": 1,
"InitialVariantWeight": 0.1, # 10% of traffic
},
],
)
sm.create_endpoint(
EndpointName="fraud-ab-test",
EndpointConfigName="ab-test-config",
)
Step 2: Monitor and Compare in CloudWatch
Both variants publish metrics to CloudWatch separately, so you can compare error rates, latency, and invocation counts per variant.
Step 3: Shift Traffic Gradually
# After confirming v2 performs well, increase to 50/50
sm.update_endpoint_weights_and_capacities(
EndpointName="fraud-ab-test",
DesiredWeightsAndCapacities=[
{"VariantName": "model-v1", "DesiredWeight": 0.5},
{"VariantName": "model-v2", "DesiredWeight": 0.5},
],
)
# Full rollout to v2
sm.update_endpoint_weights_and_capacities(
EndpointName="fraud-ab-test",
DesiredWeightsAndCapacities=[
{"VariantName": "model-v1", "DesiredWeight": 0.0},
{"VariantName": "model-v2", "DesiredWeight": 1.0},
],
)
Deployment Strategies Compared
| Strategy | How It Works | Risk | Rollback |
|---|---|---|---|
| A/B Testing | Split traffic by weight % | Low (small % on new) | Shift weight back |
| Canary | 5% → wait → 100% if healthy | Low | Auto-rollback on alarm |
| Linear | 10% → 20% → 30% → … → 100% | Medium | Auto-rollback on alarm |
| Blue/Green | Full duplicate, instant switch | Medium | Instant switch back |
| Shadow | Both serve, v2 logs only (users don’t see) | Zero | Nothing to rollback |
Lab 3: Auto-Scaling — Handle Traffic Spikes
What Is This Lab About?
Your fraud detection endpoint handles 100 requests/minute normally. But during Black Friday, traffic spikes to 10,000 req/min. Without auto-scaling, requests queue up, latency explodes, and transactions time out.
Auto-scaling automatically adds instances when traffic rises and removes them when it falls.
What You’ll Build
Normal traffic: 1 instance handles 100 req/min → 1 instance
Black Friday: 10,000 req/min arrives
InvocationsPerInstance > target(70)
→ Auto-scaling adds instances → 10 instances
After Black Friday: Traffic drops to 100 req/min
InvocationsPerInstance < target(70)
→ Cool-down period (5 min) passes
→ Auto-scaling removes instances → 1 instance
Configure Target-Tracking Policy
The most common approach: “keep invocations per instance around 70.”
autoscaling = boto3.client("application-autoscaling")
resource_id = "endpoint/fraud-realtime/variant/AllTraffic"
# Step 1: Register the endpoint as scalable
autoscaling.register_scalable_target(
ServiceNamespace="sagemaker",
ResourceId=resource_id,
ScalableDimension="sagemaker:variant:DesiredInstanceCount",
MinCapacity=1, # never go below 1 instance
MaxCapacity=20, # never exceed 20 instances
)
# Step 2: Create the scaling policy
autoscaling.put_scaling_policy(
PolicyName="target-tracking-invocations",
ServiceNamespace="sagemaker",
ResourceId=resource_id,
ScalableDimension="sagemaker:variant:DesiredInstanceCount",
PolicyType="TargetTrackingScaling",
TargetTrackingScalingPolicyConfiguration={
"TargetValue": 70.0, # aim for 70 invocations per instance
"CustomizedMetricSpecification": {
"MetricName": "InvocationsPerInstance",
"Namespace": "AWS/SageMaker",
"Dimensions": [
{"Name": "EndpointName", "Value": "fraud-realtime"},
{"Name": "VariantName", "Value": "AllTraffic"},
],
"Statistic": "Sum",
},
"ScaleInCooldown": 300, # wait 5 min before removing instances
"ScaleOutCooldown": 60, # wait 1 min before adding instances
},
)
Why cooldown matters: Without it, scaling thrashes — adds instance, immediately removes it, adds again. ScaleOutCooldown=60 means “after adding instances, wait 60s before deciding to add more.” ScaleInCooldown=300 means “after removing instances, wait 5min before removing more” (conservative on scale-in to avoid killing capacity too early).
Lab 4: SageMaker Pipelines — Complete MLOps Workflow

What Is This Lab About?
In production, you don’t train models manually. You build automated pipelines that:
- Preprocess new data
- Train a model
- Evaluate performance
- Only deploy if quality meets a threshold
- Register in Model Registry for governance
This is the core of MLOps — and it’s the most heavily tested topic in Domain 3.
What You’ll Build
┌─────────────────────────────────────────────────────────────┐
│ SageMaker Pipeline: "MLOpsPipeline" │
│ │
│ TRIGGER: New data in S3 (via EventBridge) │
│ OR: Manual start / Scheduled │
│ │
│ ┌─────────────┐ │
│ │ Step 1: │ SKLearnProcessor reads raw CSV, │
│ │ Preprocess │ handles missing values, encodes, │
│ │ Data │ scales, splits → train/val/test │
│ └──────┬──────┘ │
│ │ │
│ ┌──────▼──────┐ │
│ │ Step 2: │ XGBoost Estimator trains on │
│ │ Train Model │ processed train/validation data │
│ └──────┬──────┘ │
│ │ │
│ ┌──────▼──────┐ │
│ │ Step 3: │ ScriptProcessor runs evaluation.py │
│ │ Evaluate │ on test set, computes MSE, │
│ │ Model │ writes evaluation.json │
│ └──────┬──────┘ │
│ │ │
│ ┌──────▼──────┐ │
│ │ Step 4: │ Read MSE from evaluation.json │
│ │ Quality Gate │ IF MSE <= 6.0: │
│ │ (Condition) │ → Register model in Model Registry │
│ │ │ ELSE: │
│ └──┬───────┬──┘ → Fail pipeline │
│ │ │ │
│ ┌──▼──┐ ┌──▼──┐ │
│ │Regis│ │Fail │ Model Registry stores version with │
│ │ter │ │ │ status "PendingManualApproval" │
│ │Model│ │ │ Data scientist reviews, approves → deploy │
│ └─────┘ └─────┘ │
└─────────────────────────────────────────────────────────────┘
What’s ACTUALLY Running Behind a SageMaker Pipeline?
A Pipeline is NOT a script that runs sequentially on one machine. Each step launches separate, independent infrastructure:
ProcessingStep — spins up an EC2 instance, runs your preprocessing script in a Docker container (SKLearn, PySpark, or custom), reads from S3, writes to S3, terminates. This is a SageMaker Processing Job.
TrainingStep — spins up different EC2 instance(s), pulls the algorithm container from ECR, downloads training data from S3 (the output of the previous step), trains the model, uploads model.tar.gz to S3, terminates. This is a SageMaker Training Job.
EvaluationStep — another Processing Job. Loads the model artifact from S3, loads test data from S3, computes metrics, writes evaluation.json to S3.
ConditionStep — this one is special: it runs inside the Pipeline orchestration engine itself (no EC2). It reads the evaluation.json from S3 and makes a decision based on the metric value. No compute provisioned.
RegisterModelStep — also runs in the orchestration engine. Creates a Model Package in the SageMaker Model Registry (a metadata entry pointing to your model artifact in S3, along with metrics, lineage, and approval status).
The orchestration engine is a managed service (think Step Functions for ML). It:
- Stores the pipeline definition as a JSON DAG (directed acyclic graph)
- Passes S3 paths between steps via
step.propertiesreferences - Handles retries, parallelism, and step dependencies
- Tracks lineage: which data + code + params produced which model
Step Caching works by hashing each step’s inputs (S3 paths, parameters, code). If the hash matches a previous run, the step is skipped entirely and the cached output S3 path is reused. This means if only your hyperparameters changed, the expensive preprocessing step (which could take hours on large data) is completely skipped.
Model Registry is a metadata store (not a storage system). It stores:
- Pointer to
model.tar.gzin S3 - Model metrics from evaluation
- Approval status:
PendingManualApproval→Approved→ deployable - Lineage: which pipeline execution, which data, which code produced this
- It does NOT store the model itself — the model stays in S3
Step 1: Define Pipeline Parameters
Parameters make your pipeline reusable. Same pipeline, different inputs.
from sagemaker.workflow.parameters import ParameterInteger, ParameterString, ParameterFloat
from sagemaker.workflow.pipeline_context import PipelineSession
pipeline_session = PipelineSession()
processing_instance_count = ParameterInteger(name="ProcessingInstanceCount", default_value=1)
training_instance_type = ParameterString(name="TrainingInstanceType", default_value="ml.m5.xlarge")
model_approval_status = ParameterString(name="ModelApprovalStatus", default_value="PendingManualApproval")
mse_threshold = ParameterFloat(name="MseThreshold", default_value=6.0)
input_data = ParameterString(name="InputData", default_value=f"s3://{bucket}/data/input.csv")
Step 2: Processing Step (Data Prep)
from sagemaker.sklearn.processing import SKLearnProcessor
from sagemaker.processing import ProcessingInput, ProcessingOutput
from sagemaker.workflow.steps import ProcessingStep
processor = SKLearnProcessor(
framework_version="1.2-1", instance_type="ml.m5.xlarge",
instance_count=processing_instance_count, role=role,
sagemaker_session=pipeline_session,
)
processor_args = processor.run(
inputs=[ProcessingInput(source=input_data, destination="/opt/ml/processing/input")],
outputs=[
ProcessingOutput(output_name="train", source="/opt/ml/processing/train"),
ProcessingOutput(output_name="validation", source="/opt/ml/processing/validation"),
ProcessingOutput(output_name="test", source="/opt/ml/processing/test"),
],
code="code/preprocessing.py",
)
step_process = ProcessingStep(name="PreprocessData", step_args=processor_args)
Step 3: Training Step
Notice how the training input comes from step_process.properties — SageMaker automatically passes S3 paths between steps.
from sagemaker.estimator import Estimator
from sagemaker.inputs import TrainingInput
from sagemaker.workflow.steps import TrainingStep
xgb_train = Estimator(
image_uri=image_uri, instance_type=training_instance_type,
instance_count=1, output_path=f"s3://{bucket}/pipeline/output",
role=role, sagemaker_session=pipeline_session,
)
xgb_train.set_hyperparameters(objective="reg:linear", num_round=50, max_depth=5, eta=0.2)
train_args = xgb_train.fit(inputs={
"train": TrainingInput(
s3_data=step_process.properties.ProcessingOutputConfig.Outputs["train"].S3Output.S3Uri,
content_type="text/csv",
),
"validation": TrainingInput(
s3_data=step_process.properties.ProcessingOutputConfig.Outputs["validation"].S3Output.S3Uri,
content_type="text/csv",
),
})
step_train = TrainingStep(name="TrainModel", step_args=train_args)
Step 4: Evaluation Step
The evaluation script loads the trained model, predicts on the test set, and writes metrics to a JSON file that the Condition step reads.
from sagemaker.processing import ScriptProcessor
from sagemaker.workflow.properties import PropertyFile
eval_processor = ScriptProcessor(
image_uri=image_uri, command=["python3"],
instance_type="ml.m5.xlarge", instance_count=1,
role=role, sagemaker_session=pipeline_session,
)
eval_args = eval_processor.run(
inputs=[
ProcessingInput(
source=step_train.properties.ModelArtifacts.S3ModelArtifacts,
destination="/opt/ml/processing/model",
),
ProcessingInput(
source=step_process.properties.ProcessingOutputConfig.Outputs["test"].S3Output.S3Uri,
destination="/opt/ml/processing/test",
),
],
outputs=[ProcessingOutput(output_name="evaluation", source="/opt/ml/processing/evaluation")],
code="code/evaluation.py",
)
# PropertyFile lets the Condition step READ values from the JSON
evaluation_report = PropertyFile(name="EvaluationReport", output_name="evaluation", path="evaluation.json")
step_eval = ProcessingStep(name="EvaluateModel", step_args=eval_args, property_files=[evaluation_report])
evaluation.py writes:
{"regression_metrics": {"mse": {"value": 4.2, "standard_deviation": 1.1}}}
Step 5: Condition + Register + Fail
The quality gate: only register the model if MSE is acceptable.
from sagemaker.workflow.conditions import ConditionLessThanOrEqualTo
from sagemaker.workflow.condition_step import ConditionStep
from sagemaker.workflow.functions import JsonGet
from sagemaker.workflow.fail_step import FailStep
from sagemaker.workflow.functions import Join
from sagemaker.workflow.model_step import ModelStep
from sagemaker.model import Model
from sagemaker.model_metrics import MetricsSource, ModelMetrics
# Read MSE from the evaluation JSON
cond_lte = ConditionLessThanOrEqualTo(
left=JsonGet(
step_name=step_eval.name,
property_file=evaluation_report,
json_path="regression_metrics.mse.value",
),
right=mse_threshold, # 6.0 by default
)
# IF good: register model
model = Model(image_uri=image_uri,
model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,
sagemaker_session=pipeline_session, role=role)
register_args = model.register(
content_types=["text/csv"], response_types=["text/csv"],
inference_instances=["ml.m5.large"],
transform_instances=["ml.m5.xlarge"],
model_package_group_name="FraudModelGroup",
approval_status=model_approval_status,
)
step_register = ModelStep(name="RegisterModel", step_args=register_args)
# IF bad: fail the pipeline
step_fail = FailStep(
name="FailPipeline",
error_message=Join(on=" ", values=["MSE", "exceeded threshold:", mse_threshold]),
)
# Wire it together
step_cond = ConditionStep(
name="CheckModelQuality",
conditions=[cond_lte],
if_steps=[step_register], # MSE <= 6.0 → register
else_steps=[step_fail], # MSE > 6.0 → fail
)
Step 6: Create and Run the Pipeline
from sagemaker.workflow.pipeline import Pipeline
pipeline = Pipeline(
name="MLOpsPipeline",
parameters=[processing_instance_count, training_instance_type,
model_approval_status, input_data, mse_threshold],
steps=[step_process, step_train, step_eval, step_cond],
)
pipeline.upsert(role_arn=role)
execution = pipeline.start()
execution.wait()
for step in execution.list_steps():
print(f" {step['StepName']}: {step['StepStatus']}")
Step Caching — Avoid Reprocessing
If you re-run the pipeline with only hyperparameters changed, the preprocessing step is identical to last time. With step caching, SageMaker skips it entirely and reuses the previous output.
Run 1: Preprocess(data_v1) → Train(hp_v1) → Eval → Register ALL EXECUTE
Run 2: Preprocess(data_v1) → CACHED! SKIP → Train(hp_v2) → Eval SAVES HOURS
Trigger Pipeline Automatically
When new training data lands in S3, automatically kick off the pipeline:
S3 PutObject event → EventBridge rule → Lambda → pipeline.start()
Domain 3 Lab Summary
| Lab | Service | You Learned |
|---|---|---|
| 1 | Endpoint Types | Real-time vs Serverless vs Async vs Batch — when to use each |
| 2 | Production Variants | A/B testing, traffic splitting, gradual rollout, rollback |
| 3 | Auto-Scaling | Target tracking, InvocationsPerInstance, cooldown periods |
| 4 | SageMaker Pipelines | Complete MLOps: preprocess → train → evaluate → condition → register |