Lab: Domain 3 — Deployment & Orchestration Hands-On

14 min read 2916 words

Table of Contents

Lab: Domain 3 — Deployment & Orchestration Hands-On

Lab: Domain 3 — Deployment & Orchestration Hands-On

These labs cover the critical “last mile” of ML — getting models into production and keeping them there reliably. This is 22% of the exam and heavily focuses on endpoint types, deployment strategies, SageMaker Pipelines, and auto-scaling.

Lab 1: The Four Endpoint Types — Side by Side

What Is This Lab About?

You’ve trained a model. Now what? SageMaker offers four fundamentally different ways to serve predictions. Choosing the wrong one is the most common mistake on the exam — and in real life.

This lab deploys the same model four different ways so you can see and feel the differences.

Decision Framework

"How should I serve this model?"

                    ┌─── YES ── Consistent high traffic? ── YES ─── REAL-TIME
                    │                                               ENDPOINT
Do you need         │                                          
real-time           │           Large payload (>6MB)    ── YES ─── ASYNC
predictions? ───────┤           or long processing?                INFERENCE
                    │
                    │           Sporadic traffic,       ── YES ─── SERVERLESS
                    │           cost-sensitive?                     INFERENCE
                    │
                    └─── NO ──── Scheduled bulk scoring? ── YES ── BATCH
                                                                   TRANSFORM

The Four Types at a Glance

┌─────────────────────────────────────────────────────────────────┐
│                                                                 │
│  REAL-TIME            SERVERLESS          ASYNC           BATCH │
│  ┌──────────┐        ┌──────────┐      ┌──────────┐    ┌──────┐│
│  │ Always   │        │Scale to 0│      │ Queue +  │    │ Job  ││
│  │ running  │        │          │      │ scale to │    │ runs ││
│  │          │        │ Cold     │      │ 0        │    │ then ││
│  │ <100ms   │        │ start    │      │          │    │ dies ││
│  │          │        │ 1-5s     │      │ 1GB max  │    │      ││
│  │ 6MB max  │        │ 4MB max  │      │ 60min    │    │ No   ││
│  │          │        │ CPU only │      │ GPU OK   │    │ limit││
│  │ Min 1    │        │ Min 0    │      │ Min 0    │    │      ││
│  │ instance │        │          │      │          │    │      ││
│  └──────────┘        └──────────┘      └──────────┘    └──────┘│
│  $$$$ (24/7)         $ (per call)      $$ (per use)    $ (job) │
│                                                                 │
│  Bank fraud          Internal tool     Video/doc       Weekly   │
│  detection           5 req/day         processing      reports  │
└─────────────────────────────────────────────────────────────────┘

What’s ACTUALLY Running Behind Each Endpoint Type?

These aren’t abstract services — they’re real infrastructure with real trade-offs:

Real-Time Endpoint — EC2 instances running Docker containers 24/7:

When you call deploy(), SageMaker provisions EC2 instances of the type you chose (ml.m5.large)
It pulls your model’s inference Docker container from ECR and loads model.tar.gz from S3 into /opt/ml/model/
A reverse proxy (nginx) listens on HTTPS and routes requests to your container’s /invocations endpoint
An internal health checker pings /ping every few seconds — if your container stops responding, the instance is replaced
You pay by the instance-hour, 24/7, whether requests come or not — this is why the minimum is 1 instance (can’t scale to 0)
An internal load balancer distributes requests across instances when you have multiple

Serverless Inference — AWS manages containers that spin up/down:

No EC2 instances are pre-provisioned — that’s the whole point
When the first request arrives, SageMaker provisions a lightweight compute environment, pulls your container, loads the model — this is the cold start (1-5s)
Subsequent requests reuse the warm container (fast, like real-time)
After a period of inactivity (~5-15 min), the container is recycled — back to zero cost
The memory you configure (1024-6144 MB) determines the compute power — there’s no explicit CPU selection
No GPU option because the underlying infrastructure uses CPU-only container pools

Async Inference — SQS queue + auto-scaling compute:

Behind the scenes, SageMaker creates an internal SQS queue for your endpoint
When you call invoke_endpoint_async(), your request is placed in this queue — the API returns immediately
Worker instances poll the queue, process requests, and write results to S3
The scaling metric ApproximateBacklogSizePerInstance tells auto-scaling how full the queue is
When the queue is empty, instances scale to 0 — you stop paying
This is why async supports large payloads (1GB) — the data goes to S3 first, not through the API

Batch Transform — temporary Processing Job, no persistent endpoint:

Batch Transform creates a temporary cluster of instances, runs inference on your entire S3 dataset, writes results to S3, then terminates
There is no endpoint URL — nobody calls it in real-time
It automatically splits your input data into mini-batches and distributes across instances
join_source="Input" is critical — without it, you get predictions with no way to match them back to input rows

Deploy 1: Real-Time Endpoint

Scenario: A bank needs sub-100ms fraud scoring on every credit card transaction, 24/7, thousands of requests per minute.

from sagemaker.serializers import CSVSerializer
from sagemaker.deserializers import JSONDeserializer

predictor = xgb.deploy(
    initial_instance_count=1,
    instance_type="ml.m5.large",
    endpoint_name="fraud-realtime",
    serializer=CSVSerializer(),
)

# Synchronous call — blocks until response (~50ms)
result = predictor.predict("25,50000,680,12,0.7,0.3,0.1")
print(f"Fraud probability: {result}")

Key characteristics:

Always running, always paying (min 1 instance)
Lowest latency (typically <100ms)
Max payload: 6 MB
Max response time: 60 seconds
Cannot scale to 0 — this is a critical exam fact

Deploy 2: Serverless Inference

Scenario: A data science team runs a few ad-hoc predictions per day for internal analysis. They don’t want to pay for an always-on endpoint.

from sagemaker.serverless import ServerlessInferenceConfig

predictor = xgb.deploy(
    serverless_inference_config=ServerlessInferenceConfig(
        memory_size_in_mb=2048,    # 1024-6144 MB (must fit model + overhead)
        max_concurrency=10,         # max parallel requests
    ),
    endpoint_name="fraud-serverless",
    serializer=CSVSerializer(),
)

# First call: 1-5 second cold start (provisioning container)
# After that: fast while container stays warm
result = predictor.predict("25,50000,680,12,0.7,0.3,0.1")

Key characteristics:

Scales to ZERO when idle (no cost)
Cold start: 1-5 seconds on first call
CPU only — no GPU support
Max payload: 4 MB
Pay per inference-second

Deploy 3: Async Inference

Scenario: A company processes 500MB medical scan images through a classification model. Each takes 3-5 minutes. Traffic is irregular — sometimes 100 scans, sometimes none.

from sagemaker.async_inference import AsyncInferenceConfig

predictor = xgb.deploy(
    initial_instance_count=1,
    instance_type="ml.m5.xlarge",
    async_inference_config=AsyncInferenceConfig(
        output_path=f"s3://{bucket}/async-results/",
        max_concurrent_invocations_per_instance=4,
        notification_config={
            "SuccessTopic": "arn:aws:sns:us-east-1:123456:async-success",
            "ErrorTopic": "arn:aws:sns:us-east-1:123456:async-error",
        },
    ),
    endpoint_name="scan-async",
)

# Non-blocking — returns immediately with output location
response = session.sagemaker_runtime_client.invoke_endpoint_async(
    EndpointName="scan-async",
    InputLocation="s3://my-bucket/scans/patient_001.dcm",  # up to 1 GB!
    ContentType="application/octet-stream",
)
print(f"Results will appear at: {response['OutputLocation']}")
# Poll S3 for result, or wait for SNS notification

Key characteristics:

Scales to ZERO when idle (like serverless)
Supports GPU (unlike serverless)
Max payload: 1 GB (uploaded to S3 first)
Max processing time: 60 minutes
Built-in queue — no SQS needed
SNS notification on completion

Deploy 4: Batch Transform

Scenario: Every Sunday night, score 10 million customer records for a weekly marketing report. No one needs real-time results.

transformer = xgb.transformer(
    instance_count=2,                # use 2 instances for speed
    instance_type="ml.m5.xlarge",
    output_path=f"s3://{bucket}/batch-results/",
    strategy="MultiRecord",          # send multiple rows per request
    max_concurrent_transforms=4,     # 4 parallel invocations per instance
    max_payload=6,                   # max MB per mini-batch
    assemble_with="Line",            # join output with newlines
)

transformer.transform(
    data=f"s3://{bucket}/scoring-data/",   # entire dataset on S3
    content_type="text/csv",
    split_type="Line",
    join_source="Input",                    # include input in output (for matching)
    wait=True,
)
print(f"10M predictions saved to: {transformer.output_path}")

Key characteristics:

No persistent endpoint — spins up, runs, terminates
Pay only for job duration
No payload limit (processes entire S3 datasets)
Use join_source="Input" to match predictions back to input rows

Lab 2: A/B Testing with Production Variants

What Is This Lab About?

You’ve trained model v2 and it looks better in validation. But will it work in production? Production Variants let you split live traffic between two models and compare them side by side with real users — before committing to the switch.

What You’ll Build

                        Endpoint: "fraud-model"
                              │
                    ┌─────────┴─────────┐
                    │                   │
              90% traffic          10% traffic
                    │                   │
              ┌──────────┐        ┌──────────┐
              │ Model v1 │        │ Model v2 │
              │ (current) │        │ (new)    │
              │           │        │           │
              │ ml.m5.lg  │        │ ml.m5.lg  │
              └──────────┘        └──────────┘

  Step 1: Deploy both, 90/10 split
  Step 2: Monitor Model v2 metrics in CloudWatch
  Step 3: If v2 is better, shift to 50/50, then 0/100
  Step 4: If v2 is worse, shift back to 100/0 (rollback)

Step 1: Create Endpoint with Two Variants

sm = boto3.client("sagemaker")

sm.create_endpoint_config(
    EndpointConfigName="ab-test-config",
    ProductionVariants=[
        {
            "VariantName": "model-v1",
            "ModelName": "fraud-model-v1",
            "InstanceType": "ml.m5.large",
            "InitialInstanceCount": 1,
            "InitialVariantWeight": 0.9,   # 90% of traffic
        },
        {
            "VariantName": "model-v2",
            "ModelName": "fraud-model-v2",
            "InstanceType": "ml.m5.large",
            "InitialInstanceCount": 1,
            "InitialVariantWeight": 0.1,   # 10% of traffic
        },
    ],
)

sm.create_endpoint(
    EndpointName="fraud-ab-test",
    EndpointConfigName="ab-test-config",
)

Step 2: Monitor and Compare in CloudWatch

Both variants publish metrics to CloudWatch separately, so you can compare error rates, latency, and invocation counts per variant.

Step 3: Shift Traffic Gradually

# After confirming v2 performs well, increase to 50/50
sm.update_endpoint_weights_and_capacities(
    EndpointName="fraud-ab-test",
    DesiredWeightsAndCapacities=[
        {"VariantName": "model-v1", "DesiredWeight": 0.5},
        {"VariantName": "model-v2", "DesiredWeight": 0.5},
    ],
)

# Full rollout to v2
sm.update_endpoint_weights_and_capacities(
    EndpointName="fraud-ab-test",
    DesiredWeightsAndCapacities=[
        {"VariantName": "model-v1", "DesiredWeight": 0.0},
        {"VariantName": "model-v2", "DesiredWeight": 1.0},
    ],
)

Deployment Strategies Compared

Strategy	How It Works	Risk	Rollback
A/B Testing	Split traffic by weight %	Low (small % on new)	Shift weight back
Canary	5% → wait → 100% if healthy	Low	Auto-rollback on alarm
Linear	10% → 20% → 30% → … → 100%	Medium	Auto-rollback on alarm
Blue/Green	Full duplicate, instant switch	Medium	Instant switch back
Shadow	Both serve, v2 logs only (users don’t see)	Zero	Nothing to rollback

Lab 3: Auto-Scaling — Handle Traffic Spikes

What Is This Lab About?

Your fraud detection endpoint handles 100 requests/minute normally. But during Black Friday, traffic spikes to 10,000 req/min. Without auto-scaling, requests queue up, latency explodes, and transactions time out.

Auto-scaling automatically adds instances when traffic rises and removes them when it falls.

What You’ll Build

Normal traffic:     1 instance handles 100 req/min      → 1 instance
                    
Black Friday:       10,000 req/min arrives
                    InvocationsPerInstance > target(70)
                    → Auto-scaling adds instances         → 10 instances
                    
After Black Friday: Traffic drops to 100 req/min
                    InvocationsPerInstance < target(70)
                    → Cool-down period (5 min) passes
                    → Auto-scaling removes instances      → 1 instance

Configure Target-Tracking Policy

The most common approach: “keep invocations per instance around 70.”

autoscaling = boto3.client("application-autoscaling")

resource_id = "endpoint/fraud-realtime/variant/AllTraffic"

# Step 1: Register the endpoint as scalable
autoscaling.register_scalable_target(
    ServiceNamespace="sagemaker",
    ResourceId=resource_id,
    ScalableDimension="sagemaker:variant:DesiredInstanceCount",
    MinCapacity=1,       # never go below 1 instance
    MaxCapacity=20,      # never exceed 20 instances
)

# Step 2: Create the scaling policy
autoscaling.put_scaling_policy(
    PolicyName="target-tracking-invocations",
    ServiceNamespace="sagemaker",
    ResourceId=resource_id,
    ScalableDimension="sagemaker:variant:DesiredInstanceCount",
    PolicyType="TargetTrackingScaling",
    TargetTrackingScalingPolicyConfiguration={
        "TargetValue": 70.0,   # aim for 70 invocations per instance
        "CustomizedMetricSpecification": {
            "MetricName": "InvocationsPerInstance",
            "Namespace": "AWS/SageMaker",
            "Dimensions": [
                {"Name": "EndpointName", "Value": "fraud-realtime"},
                {"Name": "VariantName", "Value": "AllTraffic"},
            ],
            "Statistic": "Sum",
        },
        "ScaleInCooldown": 300,    # wait 5 min before removing instances
        "ScaleOutCooldown": 60,    # wait 1 min before adding instances
    },
)

Why cooldown matters: Without it, scaling thrashes — adds instance, immediately removes it, adds again. ScaleOutCooldown=60 means “after adding instances, wait 60s before deciding to add more.” ScaleInCooldown=300 means “after removing instances, wait 5min before removing more” (conservative on scale-in to avoid killing capacity too early).

Lab 4: SageMaker Pipelines — Complete MLOps Workflow

MLOps Pipeline

What Is This Lab About?

In production, you don’t train models manually. You build automated pipelines that:

Preprocess new data
Train a model
Evaluate performance
Only deploy if quality meets a threshold
Register in Model Registry for governance

This is the core of MLOps — and it’s the most heavily tested topic in Domain 3.

What You’ll Build

┌─────────────────────────────────────────────────────────────┐
│                  SageMaker Pipeline: "MLOpsPipeline"         │
│                                                             │
│  TRIGGER: New data in S3 (via EventBridge)                  │
│  OR: Manual start / Scheduled                                │
│                                                             │
│  ┌─────────────┐                                            │
│  │ Step 1:      │  SKLearnProcessor reads raw CSV,           │
│  │ Preprocess   │  handles missing values, encodes,          │
│  │ Data         │  scales, splits → train/val/test           │
│  └──────┬──────┘                                            │
│         │                                                   │
│  ┌──────▼──────┐                                            │
│  │ Step 2:      │  XGBoost Estimator trains on              │
│  │ Train Model  │  processed train/validation data           │
│  └──────┬──────┘                                            │
│         │                                                   │
│  ┌──────▼──────┐                                            │
│  │ Step 3:      │  ScriptProcessor runs evaluation.py        │
│  │ Evaluate     │  on test set, computes MSE,                │
│  │ Model        │  writes evaluation.json                    │
│  └──────┬──────┘                                            │
│         │                                                   │
│  ┌──────▼──────┐                                            │
│  │ Step 4:      │  Read MSE from evaluation.json             │
│  │ Quality Gate │  IF MSE <= 6.0:                            │
│  │ (Condition)  │    → Register model in Model Registry      │
│  │              │  ELSE:                                     │
│  └──┬───────┬──┘    → Fail pipeline                         │
│     │       │                                               │
│  ┌──▼──┐ ┌──▼──┐                                           │
│  │Regis│ │Fail │  Model Registry stores version with        │
│  │ter  │ │     │  status "PendingManualApproval"             │
│  │Model│ │     │  Data scientist reviews, approves → deploy  │
│  └─────┘ └─────┘                                            │
└─────────────────────────────────────────────────────────────┘

What’s ACTUALLY Running Behind a SageMaker Pipeline?

A Pipeline is NOT a script that runs sequentially on one machine. Each step launches separate, independent infrastructure:

ProcessingStep — spins up an EC2 instance, runs your preprocessing script in a Docker container (SKLearn, PySpark, or custom), reads from S3, writes to S3, terminates. This is a SageMaker Processing Job.

TrainingStep — spins up different EC2 instance(s), pulls the algorithm container from ECR, downloads training data from S3 (the output of the previous step), trains the model, uploads model.tar.gz to S3, terminates. This is a SageMaker Training Job.

EvaluationStep — another Processing Job. Loads the model artifact from S3, loads test data from S3, computes metrics, writes evaluation.json to S3.

ConditionStep — this one is special: it runs inside the Pipeline orchestration engine itself (no EC2). It reads the evaluation.json from S3 and makes a decision based on the metric value. No compute provisioned.

RegisterModelStep — also runs in the orchestration engine. Creates a Model Package in the SageMaker Model Registry (a metadata entry pointing to your model artifact in S3, along with metrics, lineage, and approval status).

The orchestration engine is a managed service (think Step Functions for ML). It:

Stores the pipeline definition as a JSON DAG (directed acyclic graph)
Passes S3 paths between steps via step.properties references
Handles retries, parallelism, and step dependencies
Tracks lineage: which data + code + params produced which model

Step Caching works by hashing each step’s inputs (S3 paths, parameters, code). If the hash matches a previous run, the step is skipped entirely and the cached output S3 path is reused. This means if only your hyperparameters changed, the expensive preprocessing step (which could take hours on large data) is completely skipped.

Model Registry is a metadata store (not a storage system). It stores:

Pointer to model.tar.gz in S3
Model metrics from evaluation
Approval status: PendingManualApproval → Approved → deployable
Lineage: which pipeline execution, which data, which code produced this
It does NOT store the model itself — the model stays in S3

Step 1: Define Pipeline Parameters

Parameters make your pipeline reusable. Same pipeline, different inputs.

from sagemaker.workflow.parameters import ParameterInteger, ParameterString, ParameterFloat
from sagemaker.workflow.pipeline_context import PipelineSession

pipeline_session = PipelineSession()

processing_instance_count = ParameterInteger(name="ProcessingInstanceCount", default_value=1)
training_instance_type = ParameterString(name="TrainingInstanceType", default_value="ml.m5.xlarge")
model_approval_status = ParameterString(name="ModelApprovalStatus", default_value="PendingManualApproval")
mse_threshold = ParameterFloat(name="MseThreshold", default_value=6.0)
input_data = ParameterString(name="InputData", default_value=f"s3://{bucket}/data/input.csv")

Step 2: Processing Step (Data Prep)

from sagemaker.sklearn.processing import SKLearnProcessor
from sagemaker.processing import ProcessingInput, ProcessingOutput
from sagemaker.workflow.steps import ProcessingStep

processor = SKLearnProcessor(
    framework_version="1.2-1", instance_type="ml.m5.xlarge",
    instance_count=processing_instance_count, role=role,
    sagemaker_session=pipeline_session,
)

processor_args = processor.run(
    inputs=[ProcessingInput(source=input_data, destination="/opt/ml/processing/input")],
    outputs=[
        ProcessingOutput(output_name="train", source="/opt/ml/processing/train"),
        ProcessingOutput(output_name="validation", source="/opt/ml/processing/validation"),
        ProcessingOutput(output_name="test", source="/opt/ml/processing/test"),
    ],
    code="code/preprocessing.py",
)

step_process = ProcessingStep(name="PreprocessData", step_args=processor_args)

Step 3: Training Step

Notice how the training input comes from step_process.properties — SageMaker automatically passes S3 paths between steps.

from sagemaker.estimator import Estimator
from sagemaker.inputs import TrainingInput
from sagemaker.workflow.steps import TrainingStep

xgb_train = Estimator(
    image_uri=image_uri, instance_type=training_instance_type,
    instance_count=1, output_path=f"s3://{bucket}/pipeline/output",
    role=role, sagemaker_session=pipeline_session,
)
xgb_train.set_hyperparameters(objective="reg:linear", num_round=50, max_depth=5, eta=0.2)

train_args = xgb_train.fit(inputs={
    "train": TrainingInput(
        s3_data=step_process.properties.ProcessingOutputConfig.Outputs["train"].S3Output.S3Uri,
        content_type="text/csv",
    ),
    "validation": TrainingInput(
        s3_data=step_process.properties.ProcessingOutputConfig.Outputs["validation"].S3Output.S3Uri,
        content_type="text/csv",
    ),
})

step_train = TrainingStep(name="TrainModel", step_args=train_args)

Step 4: Evaluation Step

The evaluation script loads the trained model, predicts on the test set, and writes metrics to a JSON file that the Condition step reads.

from sagemaker.processing import ScriptProcessor
from sagemaker.workflow.properties import PropertyFile

eval_processor = ScriptProcessor(
    image_uri=image_uri, command=["python3"],
    instance_type="ml.m5.xlarge", instance_count=1,
    role=role, sagemaker_session=pipeline_session,
)

eval_args = eval_processor.run(
    inputs=[
        ProcessingInput(
            source=step_train.properties.ModelArtifacts.S3ModelArtifacts,
            destination="/opt/ml/processing/model",
        ),
        ProcessingInput(
            source=step_process.properties.ProcessingOutputConfig.Outputs["test"].S3Output.S3Uri,
            destination="/opt/ml/processing/test",
        ),
    ],
    outputs=[ProcessingOutput(output_name="evaluation", source="/opt/ml/processing/evaluation")],
    code="code/evaluation.py",
)

# PropertyFile lets the Condition step READ values from the JSON
evaluation_report = PropertyFile(name="EvaluationReport", output_name="evaluation", path="evaluation.json")
step_eval = ProcessingStep(name="EvaluateModel", step_args=eval_args, property_files=[evaluation_report])

evaluation.py writes:

{"regression_metrics": {"mse": {"value": 4.2, "standard_deviation": 1.1}}}

Step 5: Condition + Register + Fail

The quality gate: only register the model if MSE is acceptable.

from sagemaker.workflow.conditions import ConditionLessThanOrEqualTo
from sagemaker.workflow.condition_step import ConditionStep
from sagemaker.workflow.functions import JsonGet
from sagemaker.workflow.fail_step import FailStep
from sagemaker.workflow.functions import Join
from sagemaker.workflow.model_step import ModelStep
from sagemaker.model import Model
from sagemaker.model_metrics import MetricsSource, ModelMetrics

# Read MSE from the evaluation JSON
cond_lte = ConditionLessThanOrEqualTo(
    left=JsonGet(
        step_name=step_eval.name,
        property_file=evaluation_report,
        json_path="regression_metrics.mse.value",
    ),
    right=mse_threshold,   # 6.0 by default
)

# IF good: register model
model = Model(image_uri=image_uri,
              model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,
              sagemaker_session=pipeline_session, role=role)

register_args = model.register(
    content_types=["text/csv"], response_types=["text/csv"],
    inference_instances=["ml.m5.large"],
    transform_instances=["ml.m5.xlarge"],
    model_package_group_name="FraudModelGroup",
    approval_status=model_approval_status,
)
step_register = ModelStep(name="RegisterModel", step_args=register_args)

# IF bad: fail the pipeline
step_fail = FailStep(
    name="FailPipeline",
    error_message=Join(on=" ", values=["MSE", "exceeded threshold:", mse_threshold]),
)

# Wire it together
step_cond = ConditionStep(
    name="CheckModelQuality",
    conditions=[cond_lte],
    if_steps=[step_register],      # MSE <= 6.0 → register
    else_steps=[step_fail],        # MSE > 6.0 → fail
)

Step 6: Create and Run the Pipeline

from sagemaker.workflow.pipeline import Pipeline

pipeline = Pipeline(
    name="MLOpsPipeline",
    parameters=[processing_instance_count, training_instance_type,
                model_approval_status, input_data, mse_threshold],
    steps=[step_process, step_train, step_eval, step_cond],
)

pipeline.upsert(role_arn=role)
execution = pipeline.start()
execution.wait()

for step in execution.list_steps():
    print(f"  {step['StepName']}: {step['StepStatus']}")

Step Caching — Avoid Reprocessing

If you re-run the pipeline with only hyperparameters changed, the preprocessing step is identical to last time. With step caching, SageMaker skips it entirely and reuses the previous output.

Run 1: Preprocess(data_v1) → Train(hp_v1) → Eval → Register     ALL EXECUTE
Run 2: Preprocess(data_v1) → CACHED! SKIP → Train(hp_v2) → Eval  SAVES HOURS

Trigger Pipeline Automatically

When new training data lands in S3, automatically kick off the pipeline:

S3 PutObject event  →  EventBridge rule  →  Lambda  →  pipeline.start()

Domain 3 Lab Summary

Lab	Service	You Learned
1	Endpoint Types	Real-time vs Serverless vs Async vs Batch — when to use each
2	Production Variants	A/B testing, traffic splitting, gradual rollout, rollback
3	Auto-Scaling	Target tracking, InvocationsPerInstance, cooldown periods
4	SageMaker Pipelines	Complete MLOps: preprocess → train → evaluate → condition → register

Lab: Domain 3 — Deployment & Orchestration Hands-On#

Lab 1: The Four Endpoint Types — Side by Side#

What Is This Lab About?#

Decision Framework#

The Four Types at a Glance#

What’s ACTUALLY Running Behind Each Endpoint Type?#

Deploy 1: Real-Time Endpoint#

Deploy 2: Serverless Inference#

Deploy 3: Async Inference#

Deploy 4: Batch Transform#

Lab 2: A/B Testing with Production Variants#

What Is This Lab About?#

What You’ll Build#

Step 1: Create Endpoint with Two Variants#

Step 2: Monitor and Compare in CloudWatch#

Step 3: Shift Traffic Gradually#

Deployment Strategies Compared#

Lab 3: Auto-Scaling — Handle Traffic Spikes#

What Is This Lab About?#

What You’ll Build#

Configure Target-Tracking Policy#

Lab 4: SageMaker Pipelines — Complete MLOps Workflow#

What Is This Lab About?#

What You’ll Build#

What’s ACTUALLY Running Behind a SageMaker Pipeline?#

Step 1: Define Pipeline Parameters#

Step 2: Processing Step (Data Prep)#

Step 3: Training Step#

Step 4: Evaluation Step#

Step 5: Condition + Register + Fail#

Step 6: Create and Run the Pipeline#

Step Caching — Avoid Reprocessing#

Trigger Pipeline Automatically#

Domain 3 Lab Summary#

Lab: Domain 3 — Deployment & Orchestration Hands-On

Lab 1: The Four Endpoint Types — Side by Side

What Is This Lab About?

Decision Framework

The Four Types at a Glance

What’s ACTUALLY Running Behind Each Endpoint Type?

Deploy 1: Real-Time Endpoint

Deploy 2: Serverless Inference

Deploy 3: Async Inference

Deploy 4: Batch Transform

Lab 2: A/B Testing with Production Variants

What Is This Lab About?

What You’ll Build

Step 1: Create Endpoint with Two Variants

Step 2: Monitor and Compare in CloudWatch

Step 3: Shift Traffic Gradually

Deployment Strategies Compared

Lab 3: Auto-Scaling — Handle Traffic Spikes

What Is This Lab About?

What You’ll Build

Configure Target-Tracking Policy

Lab 4: SageMaker Pipelines — Complete MLOps Workflow

What Is This Lab About?

What You’ll Build

What’s ACTUALLY Running Behind a SageMaker Pipeline?

Step 1: Define Pipeline Parameters

Step 2: Processing Step (Data Prep)

Step 3: Training Step

Step 4: Evaluation Step

Step 5: Condition + Register + Fail

Step 6: Create and Run the Pipeline

Step Caching — Avoid Reprocessing

Trigger Pipeline Automatically

Domain 3 Lab Summary