Claude with Amazon Bedrock - Certification Study Guide

46 min read 9614 words

Table of Contents

Claude with Amazon Bedrock - Certification Study Guide

Claude with Amazon Bedrock - Certification Study Guide

Course: Anthropic - Claude with Amazon Bedrock Modules: 11 (Introduction through Course Wrap-Up) Lessons: 40+ individual lessons Target: Certification Prep

MODULE 1: Course Introduction

Lesson Coverage

Introduction to the course structure and goals
Claude Models available on Amazon Bedrock

Key Notes

Course Scope:

Using Claude via AWS Bedrock — same models, different auth/SDK layer
All exercises use boto3, NOT the anthropic Python SDK
AWS handles billing, compliance (SOC2, HIPAA-eligible), and data residency

Claude Models Available on Bedrock:

Model ID	Version	Use Case
`anthropic.claude-3-5-sonnet-20241022-v2:0`	Sonnet 3.5 v2	Best balance of speed and intelligence
`anthropic.claude-3-5-haiku-20241022-v1:0`	Haiku 3.5	Fast and cheap, high volume tasks
`anthropic.claude-3-haiku-20240307-v1:0`	Haiku 3	Older Haiku, still available
`anthropic.claude-3-opus-20240229-v1:0`	Opus 3	Most capable, slower

Cross-Region Inference Profile IDs (prefix us. or eu.):

us.anthropic.claude-3-5-sonnet-20241022-v2:0
us.anthropic.claude-3-5-haiku-20241022-v1:0
eu.anthropic.claude-3-5-sonnet-20241022-v2:0

Bedrock vs Direct API — At a Glance:

Direct Anthropic API          Amazon Bedrock
─────────────────────         ──────────────────────
Auth: API Key                 Auth: IAM roles/credentials
SDK:  anthropic               SDK:  boto3
URL:  api.anthropic.com       URL:  bedrock-runtime.<region>.amazonaws.com
Bill: Anthropic invoice       Bill: AWS invoice

Prerequisites for the Course:

Python 3.9+
AWS account with Bedrock enabled
IAM user or role with Bedrock permissions
Model access requested in Bedrock console (not automatic — must be manually enabled)

Enabling Model Access:

Open AWS Console → Amazon Bedrock → Model access
Click “Manage model access”
Check Claude models → Submit request
Wait for “Access granted” status (usually instant for Claude)

Required IAM Permissions:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel",
        "bedrock:InvokeModelWithResponseStream",
        "bedrock:CreateModelInvocationJob",
        "bedrock:GetModelInvocationJob",
        "bedrock:ListFoundationModels"
      ],
      "Resource": "*"
    }
  ]
}

Best Practices

Enable model access in the Bedrock console before running any code (access is not automatic)
Use IAM roles (not static credentials) in production — attach AmazonBedrockFullAccess or scoped policy
Choose region where Claude models are available: us-east-1, us-west-2, eu-west-3, ap-southeast-1
Pin model IDs including version suffix to prevent unexpected upgrades
Never hardcode AWS credentials — use environment variables or IAM instance profiles

MODULE 2: Working with the API

Lesson Coverage

API access setup (boto3, credentials, regions)
Creating your first request (request body structure)
Multi-turn conversations (stateless history management)
Chatbot exercise (building a REPL)
System prompts (configuring model behavior)
Temperature settings (controlling randomness)
Streaming responses (real-time output)
Output control (stop sequences, max_tokens, top_p, top_k)
Structured data extraction (JSON output)

Key Notes

boto3 Setup and Authentication:

import boto3
import json

# Option 1: Explicit credentials (dev only — never in production code)
client = boto3.client(
    'bedrock-runtime',
    region_name='us-east-1',
    aws_access_key_id='AKIA...',
    aws_secret_access_key='...'
)

# Option 2: Environment variables (recommended for dev)
# AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_DEFAULT_REGION
client = boto3.client('bedrock-runtime', region_name='us-east-1')

# Option 3: IAM role (recommended for production / EC2 / Lambda)
# boto3 automatically picks up role credentials
client = boto3.client('bedrock-runtime', region_name='us-east-1')

# Option 4: Named profile from ~/.aws/credentials
session = boto3.Session(profile_name='my-bedrock-profile')
client = session.client('bedrock-runtime', region_name='us-east-1')

Core Bedrock-Runtime Methods:

Method	Behavior	Use Case
`invoke_model()`	Synchronous — waits for full response	Standard requests
`invoke_model_with_response_stream()`	Streaming — returns chunks progressively	Long responses, UX
`create_model_invocation_job()`	Async batch — S3 input/output	Large-scale batch
`get_model_invocation_job()`	Check batch job status	Polling batch jobs

Request Body Format (CRITICAL):

The request body is NOT keyword arguments.
It's a JSON string encoded as bytes, passed as the 'body' parameter.
The modelId is a SEPARATE parameter from the body.

Required structure:

body = {
    "anthropic_version": "bedrock-2023-05-31",  # REQUIRED — do not omit
    "max_tokens": 1024,                           # REQUIRED
    "messages": [                                 # REQUIRED
        {"role": "user", "content": "Hello"}
    ],
    # Optional fields:
    "system": "You are a helpful assistant.",
    "temperature": 0.7,
    "top_p": 0.9,
    "top_k": 250,
    "stop_sequences": ["###"]
}

response = client.invoke_model(
    modelId='anthropic.claude-3-5-sonnet-20241022-v2:0',
    contentType='application/json',
    accept='application/json',
    body=json.dumps(body)   # Must be bytes or string
)

Response Structure:

response = client.invoke_model(...)

# response is a dict with HTTP metadata + 'body' StreamingBody
raw = response['body'].read()          # Read all bytes
result = json.loads(raw)               # Parse JSON

# result structure:
# {
#   "id": "msg_01...",
#   "type": "message",
#   "role": "assistant",
#   "content": [{"type": "text", "text": "..."}],
#   "model": "claude-3-5-sonnet-20241022",
#   "stop_reason": "end_turn",
#   "stop_sequence": null,
#   "usage": {"input_tokens": 25, "output_tokens": 142}
# }

text = result['content'][0]['text']
input_tokens = result['usage']['input_tokens']
output_tokens = result['usage']['output_tokens']

Lesson 2: Basic Request Example

import boto3
import json

client = boto3.client('bedrock-runtime', region_name='us-east-1')
MODEL_ID = 'anthropic.claude-3-5-sonnet-20241022-v2:0'

def invoke_claude(prompt, system=None, max_tokens=1024, temperature=1.0):
    body = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": max_tokens,
        "messages": [{"role": "user", "content": prompt}]
    }
    if system:
        body["system"] = system
    if temperature != 1.0:
        body["temperature"] = temperature

    response = client.invoke_model(
        modelId=MODEL_ID,
        contentType='application/json',
        accept='application/json',
        body=json.dumps(body)
    )

    result = json.loads(response['body'].read())
    return result['content'][0]['text'], result['usage']

text, usage = invoke_claude("What is the capital of France?")
print(text)
print(f"Tokens used: {usage['input_tokens']} in, {usage['output_tokens']} out")

Lesson 3 & 4: Multi-Turn Conversations and Chatbot

import boto3, json

client = boto3.client('bedrock-runtime', region_name='us-east-1')
MODEL_ID = 'anthropic.claude-3-5-sonnet-20241022-v2:0'

class BedrockChatbot:
    def __init__(self, system_prompt=None):
        self.system = system_prompt
        self.history = []     # List of {"role": ..., "content": ...}
        self.total_tokens = {"input": 0, "output": 0}

    def chat(self, user_message, max_tokens=2048):
        self.history.append({"role": "user", "content": user_message})

        body = {
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": max_tokens,
            "messages": self.history
        }
        if self.system:
            body["system"] = self.system

        response = client.invoke_model(
            modelId=MODEL_ID,
            contentType='application/json',
            accept='application/json',
            body=json.dumps(body)
        )

        result = json.loads(response['body'].read())
        assistant_text = result['content'][0]['text']

        self.history.append({"role": "assistant", "content": assistant_text})
        self.total_tokens["input"] += result['usage']['input_tokens']
        self.total_tokens["output"] += result['usage']['output_tokens']

        return assistant_text

    def reset(self):
        self.history = []

# REPL chatbot
bot = BedrockChatbot(system_prompt="You are an expert Python tutor.")
print("Chatbot ready. Type 'quit' to exit, 'reset' to clear history.")
while True:
    user_input = input("You: ").strip()
    if user_input.lower() == 'quit':
        break
    if user_input.lower() == 'reset':
        bot.reset()
        print("History cleared.")
        continue
    response = bot.chat(user_input)
    print(f"Claude: {response}\n")
print(f"Total tokens: {bot.total_tokens}")

Lesson 5: System Prompts

# System prompts configure the model's persona, constraints, and behavior
# They are separate from the messages array — top-level "system" key

body = {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 1024,
    "system": """You are a senior AWS solutions architect.
Always recommend AWS-native services where possible.
Format responses with clear headings and bullet points.
If asked about pricing, always say to check the AWS Pricing Calculator.""",
    "messages": [
        {"role": "user", "content": "How should I deploy a Python API?"}
    ]
}

Lesson 6: Temperature Settings

# temperature: 0.0 to 1.0 (some models accept up to 2.0 via direct API, Bedrock caps at 1.0)
# 0.0 = deterministic, always same output (good for structured tasks)
# 1.0 = maximum diversity (good for creative tasks)

# Deterministic extraction
body_extract = {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 512,
    "temperature": 0.0,    # Fully deterministic
    "messages": [{"role": "user", "content": "Extract the date from: Meeting on Jan 15 2024"}]
}

# Creative writing
body_creative = {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 1024,
    "temperature": 1.0,    # Full creativity
    "messages": [{"role": "user", "content": "Write a haiku about cloud computing."}]
}

# top_p: nucleus sampling — consider top P% of probability mass (alternative to temperature)
# top_k: only consider top K tokens at each step
# Do not set both temperature and top_p — choose one

Lesson 7: Streaming Responses

import boto3, json

client = boto3.client('bedrock-runtime', region_name='us-east-1')
MODEL_ID = 'anthropic.claude-3-5-sonnet-20241022-v2:0'

def stream_response(prompt, system=None, max_tokens=2048):
    body = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": max_tokens,
        "messages": [{"role": "user", "content": prompt}]
    }
    if system:
        body["system"] = system

    response = client.invoke_model_with_response_stream(
        modelId=MODEL_ID,
        contentType='application/json',
        accept='application/json',
        body=json.dumps(body)
    )

    full_text = ""
    input_tokens = 0
    output_tokens = 0

    for event in response['body']:
        chunk_bytes = event['chunk']['bytes']
        chunk = json.loads(chunk_bytes.decode('utf-8'))

        event_type = chunk.get('type', '')

        if event_type == 'content_block_delta':
            # Text delta — print as it arrives
            delta_text = chunk['delta'].get('text', '')
            print(delta_text, end='', flush=True)
            full_text += delta_text

        elif event_type == 'message_delta':
            # Final stats
            output_tokens = chunk.get('usage', {}).get('output_tokens', 0)

        elif event_type == 'message_start':
            # Initial metadata including input token count
            input_tokens = chunk.get('message', {}).get('usage', {}).get('input_tokens', 0)

        elif event_type == 'message_stop':
            # Stream complete
            pass

    print()  # newline after stream
    return full_text, {"input_tokens": input_tokens, "output_tokens": output_tokens}

# Usage
text, usage = stream_response("Explain quantum entanglement in simple terms.")
print(f"\nUsage: {usage}")

Streaming Event Types Reference:

message_start       — first event, contains model + initial usage
content_block_start — begins a content block (text or tool_use)
content_block_delta — text or JSON delta within a block
content_block_stop  — ends a content block
message_delta       — stop_reason, stop_sequence, final output usage
message_stop        — stream complete (last event)

Lesson 8: Output Control

# max_tokens: hard cap on output length (required field)
# stop_sequences: list of strings that end generation when encountered
# top_p: cumulative probability for nucleus sampling
# top_k: limit token selection to top K candidates

body = {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 2048,
    "stop_sequences": ["</answer>", "Human:", "###"],
    "top_p": 0.95,
    "messages": [
        {"role": "user", "content": "Answer this: What is 2+2? Wrap in <answer> tags."}
    ]
}

# The model stops when it would generate a stop sequence
# result['stop_reason'] will be "stop_sequence" if that triggered the stop
# result['stop_sequence'] gives which sequence triggered it

Lesson 9: Structured Data Extraction

import json

def extract_structured_data(text, schema_description):
    """Extract structured data from unstructured text using Claude."""
    prompt = f"""Extract the following information from the text and return as JSON.

Schema: {schema_description}

Text:
{text}

Return ONLY valid JSON, no explanation."""

    body = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 1024,
        "temperature": 0.0,   # Deterministic for extraction
        "messages": [{"role": "user", "content": prompt}]
    }

    response = client.invoke_model(
        modelId=MODEL_ID,
        contentType='application/json',
        accept='application/json',
        body=json.dumps(body)
    )

    result = json.loads(response['body'].read())
    raw_text = result['content'][0]['text'].strip()

    # Strip markdown code fences if present
    if raw_text.startswith('```'):
        raw_text = raw_text.split('\n', 1)[1].rsplit('```', 1)[0].strip()

    return json.loads(raw_text)

# Example usage
invoice_text = """
Invoice #1234 from Acme Corp, dated March 15 2024.
Total amount: $2,450.00. Payment due April 15 2024.
Contact: billing@acme.com
"""

schema = '{"invoice_number": "string", "vendor": "string", "amount": "number", "due_date": "string"}'
data = extract_structured_data(invoice_text, schema)
print(data)
# {'invoice_number': '1234', 'vendor': 'Acme Corp', 'amount': 2450.00, 'due_date': '2024-04-15'}

Cross-Region Inference:

# Standard model ID — fixed to one region
MODEL_ID = 'anthropic.claude-3-5-sonnet-20241022-v2:0'

# Cross-region inference profile — routes across AZs within geography
# US profiles:
CROSS_REGION_US = 'us.anthropic.claude-3-5-sonnet-20241022-v2:0'

# EU profiles:
CROSS_REGION_EU = 'eu.anthropic.claude-3-5-sonnet-20241022-v2:0'

# When using cross-region, the client region still determines the entry point
client = boto3.client('bedrock-runtime', region_name='us-east-1')
# Then use CROSS_REGION_US as modelId — Bedrock routes intelligently

Error Handling:

import boto3
import json
from botocore.exceptions import ClientError
import time

def invoke_with_retry(client, model_id, body, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = client.invoke_model(
                modelId=model_id,
                contentType='application/json',
                accept='application/json',
                body=json.dumps(body)
            )
            return json.loads(response['body'].read())

        except ClientError as e:
            error_code = e.response['Error']['Code']

            if error_code == 'ThrottlingException':
                wait = (2 ** attempt) + 1  # Exponential backoff
                print(f"Throttled. Waiting {wait}s before retry {attempt+1}/{max_retries}")
                time.sleep(wait)
                continue

            elif error_code == 'ValidationException':
                raise ValueError(f"Invalid request: {e.response['Error']['Message']}")

            elif error_code == 'AccessDeniedException':
                raise PermissionError("Check IAM permissions and model access in Bedrock console")

            elif error_code == 'ModelNotReadyException':
                raise RuntimeError("Model not available — check Bedrock console model access status")

            else:
                raise

    raise RuntimeError(f"Max retries ({max_retries}) exceeded")

Best Practices

Always include "anthropic_version": "bedrock-2023-05-31" in request body (required by Bedrock)
Always call response['body'].read() before parsing — StreamingBody can only be read once
Handle ThrottlingException with exponential backoff — Bedrock has per-account TPS limits
Stream for responses >500 tokens to improve perceived latency
Use cross-region inference profiles in production for higher availability
Temperature 0.0 for extraction/classification, 0.7-1.0 for generation
Set max_tokens conservatively for cost control — it’s a cap, not a target

MODULE 3: Prompt Evaluations

Lesson Coverage

Why evaluations matter (quality assurance for prompts)
Dataset generation strategies
Running evaluations at scale on Bedrock
Grading approaches (exact match, model-based, semantic)
Exercises (build an eval harness)
Assessment review

Key Notes

Evaluation Workflow:

1. Define task clearly with success criteria
2. Create labeled test dataset (golden set)
3. Run model against all examples
4. Apply grading function to each output
5. Compute aggregate metrics
6. Identify failure patterns
7. Iterate on prompt
8. Re-evaluate (never on same examples used to develop prompt)

Why Evals Matter on Bedrock:

Model updates happen (new model versions, access to newer Sonnet/Haiku)
Prompt regressions are common when changing system prompt
Cost/quality tradeoffs — Haiku vs. Sonnet — need data to decide
CI/CD integration: run evals on every prompt change before deployment

Test Dataset Design:

Minimum 20 examples before drawing conclusions; 50+ for reliable metrics
Cover: happy path, edge cases, adversarial inputs, language variants
Balance classes for classification tasks (equal positive/negative)
Never overlap training examples with evaluation examples
Label expected outputs precisely — ambiguous labels produce noisy metrics

Dataset Generation with Claude:

def generate_eval_dataset(task_description, num_examples=20):
    """Use Claude to generate diverse test cases."""
    prompt = f"""Generate {num_examples} diverse test cases for this task: {task_description}

For each test case, provide:
1. An input
2. The expected output
3. The difficulty level (easy/medium/hard)

Format as JSON array: [{{"input": "...", "expected": "...", "difficulty": "..."}}]
Return ONLY the JSON array."""

    body = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 4096,
        "temperature": 1.0,   # Diversity for generation
        "messages": [{"role": "user", "content": prompt}]
    }
    response = client.invoke_model(
        modelId=MODEL_ID,
        contentType='application/json',
        accept='application/json',
        body=json.dumps(body)
    )
    result = json.loads(response['body'].read())
    return json.loads(result['content'][0]['text'])

Grading Methods:

Method	When to Use	Code Pattern
Exact match	Controlled outputs (YES/NO, category names)	`output.strip() == expected`
Regex	Format validation (dates, emails, IDs)	`bool(re.match(pattern, output))`
JSON schema	Structured data extraction	`jsonschema.validate(parsed, schema)`
Substring	Keyword presence	`expected.lower() in output.lower()`
Semantic similarity	Paraphrase equivalence	cosine_sim(embed(output), embed(expected))
Model-based	Open-ended quality	Ask Claude to score 1-5

Model-Based Grader:

GRADER_MODEL = 'anthropic.claude-3-haiku-20240307-v1:0'  # Cheap for grading

def model_grade(output, expected, criterion, scale=5):
    """Use Claude to grade an output. Returns score 1-scale."""
    grading_prompt = f"""You are an impartial grader. Evaluate the output against the criterion.

Criterion: {criterion}
Expected: {expected}
Actual Output: {output}

Score the output from 1 to {scale}:
1 = Completely wrong
{scale} = Perfect

Return ONLY a single integer score."""

    body = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 10,
        "temperature": 0.0,
        "messages": [{"role": "user", "content": grading_prompt}]
    }
    response = client.invoke_model(
        modelId=GRADER_MODEL,
        contentType='application/json',
        accept='application/json',
        body=json.dumps(body)
    )
    result = json.loads(response['body'].read())
    try:
        return int(result['content'][0]['text'].strip())
    except ValueError:
        return 0   # Failed to parse score

Running Evaluations at Scale:

import concurrent.futures
import json
import time
from datetime import datetime

def run_evaluation(dataset, prompt_template, model_id, grader_fn, max_workers=5):
    """Run full evaluation suite with parallel grading."""

    def evaluate_single(example):
        # Format prompt with template
        prompt = prompt_template.format(input=example['input'])
        body = {
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": 512,
            "temperature": 0.0,
            "messages": [{"role": "user", "content": prompt}]
        }
        try:
            response = client.invoke_model(
                modelId=model_id,
                contentType='application/json',
                accept='application/json',
                body=json.dumps(body)
            )
            result = json.loads(response['body'].read())
            output = result['content'][0]['text'].strip()
            score = grader_fn(output, example['expected'])
            return {
                "input": example['input'],
                "expected": example['expected'],
                "output": output,
                "score": score,
                "tokens": result['usage']
            }
        except Exception as e:
            return {"input": example['input'], "error": str(e), "score": 0}

    with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
        results = list(executor.map(evaluate_single, dataset))

    # Compute metrics
    scores = [r['score'] for r in results if 'score' in r]
    total_tokens = sum(
        r.get('tokens', {}).get('input_tokens', 0) + r.get('tokens', {}).get('output_tokens', 0)
        for r in results
    )

    summary = {
        "timestamp": datetime.now().isoformat(),
        "model_id": model_id,
        "n_examples": len(dataset),
        "accuracy": sum(1 for s in scores if s >= 1) / len(scores),
        "avg_score": sum(scores) / len(scores) if scores else 0,
        "total_tokens": total_tokens,
        "failures": [r for r in results if 'error' in r]
    }

    return results, summary

Key Metrics to Track:

Accuracy: Percentage of examples meeting passing threshold
Precision / Recall / F1: For classification tasks
Average score: When using 1-5 scale grading
p95 latency: 95th percentile response time
Cost per correct answer: total_tokens × price_per_token / n_correct
Regression rate: % of examples that passed before but fail after prompt change

Best Practices

Never evaluate on examples used during prompt development (train/test split)
Use cheap models (Haiku) for grading to reduce evaluation cost
Log all evaluation runs: timestamp, model ID, prompt version, all scores
Compare prompt versions side-by-side on identical test sets
Run evals in CI/CD: gate deployments on minimum accuracy threshold
Use parallel execution (ThreadPoolExecutor) to speed up large evals
Validate your grader itself: manually check 10% of grader judgments

MODULE 4: Prompt Engineering

Lesson Coverage

Direct instructions (clarity, specificity)
Specificity and constraints (format, length, style)
XML structure for complex prompts
Few-shot examples
Exercises
Quiz and assessment

Key Notes

Core Principles:

Vague:   "Summarize this article."
Better:  "Summarize this article in 3 bullet points. Each point max 20 words."
Best:    "Summarize this article in exactly 3 bullet points. Use active voice.
          Each bullet must start with a strong verb. Max 20 words each."

Clarity Checklist:

Specify output format (bullet list, JSON, prose paragraph, numbered steps)
Specify output length (characters, words, sentences, tokens)
Specify constraints (tone, reading level, language, persona)
Use affirmative instructions (“always do X”) over negative (“don’t do Y”)
Put the most important instruction first or last (primacy and recency effects)

XML Tags for Complex Prompts:

<instructions>
Classify the customer review as POSITIVE, NEGATIVE, or NEUTRAL.
Rules:
1. If the review mentions both pros and cons equally, use NEUTRAL
2. Ignore overall star rating — focus only on the text sentiment
3. Respond with ONLY the classification word, nothing else
</instructions>

<examples>
<example>
<input>Amazing product! Shipped fast and works perfectly.</input>
<output>POSITIVE</output>
</example>
<example>
<input>The quality is decent but the price is way too high for what you get.</input>
<output>NEUTRAL</output>
</example>
<example>
<input>Terrible. Broke after 2 days. Don't buy this.</input>
<output>NEGATIVE</output>
</example>
</examples>

<review>
{customer_review}
</review>

Benefits of XML structure:

Clear separation of instructions, examples, and variable content
Easier to update individual sections without breaking others
Claude is trained to pay attention to XML tag boundaries
Prevents variable content from contaminating instructions

Few-Shot Examples:

# Few-shot improves consistency for pattern-following tasks
# Include 2-5 examples that exactly match desired output format

few_shot_prompt = """Convert natural language to SQL.

Input: Get all users who signed up after January 2024
SQL: SELECT * FROM users WHERE created_at > '2024-01-01';

Input: Find the top 10 products by revenue
SQL: SELECT product_id, SUM(revenue) as total FROM orders GROUP BY product_id ORDER BY total DESC LIMIT 10;

Input: {user_query}
SQL:"""

Chain of Thought:

# Standard CoT: ask model to think before answering
cot_prompt = """Solve this step by step, then give the final answer.

Problem: {problem}

Show your reasoning in <thinking> tags, then the answer in <answer> tags."""

# Zero-shot CoT trigger phrases:
# - "Let's think step by step."
# - "Work through this carefully."
# - "Think before answering."

# Structured CoT with stop sequence
body = {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 2048,
    "stop_sequences": ["</answer>"],   # Stop after getting the answer
    "messages": [{"role": "user", "content": cot_prompt.format(problem="...")}]
}

Role Prompting:

system_roles = {
    "code_reviewer": "You are a senior software engineer with 15 years of experience. Review code for bugs, security issues, and style. Be direct and specific.",
    "legal_analyst": "You are a paralegal assistant. Identify legal issues in contracts. Always note that this is not legal advice and recommend consulting a lawyer.",
    "data_scientist": "You are a senior data scientist. Analyze data, suggest statistical approaches, and write production-quality Python code using pandas, numpy, and scikit-learn.",
}

Output Format Techniques:

# Force JSON output
json_prompt = """Extract entity information from the text.
Return a JSON object with these keys: name, type, description.
Return ONLY the JSON object, no other text.

Text: {text}"""

# Force specific structure with prefill
body = {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 512,
    "messages": [
        {"role": "user", "content": json_prompt},
        {"role": "assistant", "content": "{"}  # Prefill forces JSON start
    ]
}

Temperature Guide:

Task Type	Temperature	Why
Data extraction	0.0	Must be deterministic
Classification	0.0	Consistent categories
Code generation	0.0–0.3	Correct syntax matters
Summarization	0.3–0.5	Some variation OK
Q&A	0.3–0.7	Factual but not robotic
Creative writing	0.7–1.0	Diversity is valuable
Brainstorming	0.9–1.0	Maximum variety

Best Practices

Start simple, add complexity only when needed (KISS principle)
Test prompts on edge cases: empty input, Unicode, very long text, adversarial inputs
Separate dynamic content from static instructions using XML tags
Few-shot examples must exactly match the output format you want (including whitespace)
Use temperature=0 and store the prompt in version control — treat prompts as code
When using prefill (pre-filling assistant response), verify it doesn’t hurt quality
Stop sequences are powerful — use </answer> to extract just the needed section

MODULE 5: Tool Use

Lesson Coverage

Introduction to tool use (function calling)
Defining functions as tools
JSON Schema for tool parameters
Handling tool_use responses
Multi-turn tool conversations (the loop)
Multiple tools and parallel calls
Batch tool operations
Structured extraction via tools

Key Notes

Tool Use Message Flow:

┌─────────────────────────────────────────────────────────┐
│  1. Client sends: messages + tools list                  │
│  2. Claude responds: stop_reason="tool_use"              │
│     content: [text_block?, tool_use_block, ...]          │
│  3. Client executes the tool                             │
│  4. Client sends: original messages + assistant response │
│                   + user message with tool_result        │
│  5. Claude responds: stop_reason="end_turn"              │
│     content: [text_block with final answer]              │
└─────────────────────────────────────────────────────────┘

Tool Schema Definition (JSON Schema):

tools = [
    {
        "name": "get_weather",                    # Snake_case, clear verb+noun
        "description": """Get current weather for a city.
Returns temperature in specified unit and weather condition.
Use this when the user asks about current weather in any location.""",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {
                    "type": "string",
                    "description": "City name, e.g. 'New York' or 'London'"
                },
                "unit": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "Temperature unit. Default to celsius."
                }
            },
            "required": ["city"]   # unit is optional
        }
    },
    {
        "name": "search_database",
        "description": "Search the product database by keyword or SKU",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "Search term"},
                "limit": {
                    "type": "integer",
                    "description": "Max results to return",
                    "minimum": 1,
                    "maximum": 50,
                    "default": 10
                },
                "category": {
                    "type": "string",
                    "enum": ["electronics", "clothing", "books", "all"],
                    "default": "all"
                }
            },
            "required": ["query"]
        }
    }
]

Detecting and Handling tool_use Responses:

def parse_tool_calls(response):
    """Extract tool calls from Claude's response content."""
    tool_calls = []
    text_blocks = []

    for block in response['content']:
        if block['type'] == 'tool_use':
            tool_calls.append({
                'id': block['id'],          # Must be preserved for tool_result
                'name': block['name'],
                'input': block['input']     # Dict matching tool's input_schema
            })
        elif block['type'] == 'text':
            text_blocks.append(block['text'])

    return tool_calls, text_blocks

Building Tool Results Message:

def build_tool_results_message(tool_results_map):
    """Build the user message containing all tool results.
    tool_results_map: {tool_use_id: result_data}
    """
    content = []
    for tool_use_id, result in tool_results_map.items():
        content.append({
            "type": "tool_result",
            "tool_use_id": tool_use_id,         # Must match the tool_use block's id
            "content": json.dumps(result)        # String or list of content blocks
        })
    return {"role": "user", "content": content}

# Error in tool execution
def build_tool_error(tool_use_id, error_message):
    return {
        "type": "tool_result",
        "tool_use_id": tool_use_id,
        "content": f"Error: {error_message}",
        "is_error": True    # Signals to Claude that the tool failed
    }

Complete Tool Loop (Lesson 5):

import boto3, json

client = boto3.client('bedrock-runtime', region_name='us-east-1')
MODEL_ID = 'anthropic.claude-3-5-sonnet-20241022-v2:0'

# Define available tools
tools = [
    {
        "name": "get_stock_price",
        "description": "Get the current stock price for a ticker symbol",
        "input_schema": {
            "type": "object",
            "properties": {
                "ticker": {"type": "string", "description": "Stock ticker symbol, e.g. AAPL"}
            },
            "required": ["ticker"]
        }
    },
    {
        "name": "calculate",
        "description": "Evaluate a mathematical expression",
        "input_schema": {
            "type": "object",
            "properties": {
                "expression": {"type": "string", "description": "Math expression to evaluate"}
            },
            "required": ["expression"]
        }
    }
]

# Mock tool implementations
def get_stock_price(ticker):
    prices = {"AAPL": 182.50, "GOOGL": 141.80, "AMZN": 175.30}
    price = prices.get(ticker.upper())
    if price is None:
        return {"error": f"Unknown ticker: {ticker}"}
    return {"ticker": ticker, "price": price, "currency": "USD"}

def calculate(expression):
    try:
        # Use eval carefully — in production, use a safe math parser
        result = eval(expression, {"__builtins__": {}}, {"abs": abs, "round": round})
        return {"result": result}
    except Exception as e:
        return {"error": str(e)}

def dispatch_tool(name, tool_input):
    """Route tool call to implementation."""
    if name == "get_stock_price":
        return get_stock_price(**tool_input)
    elif name == "calculate":
        return calculate(**tool_input)
    else:
        return {"error": f"Unknown tool: {name}"}

def run_agent(user_message, max_iterations=10):
    """Run the tool use loop until end_turn."""
    messages = [{"role": "user", "content": user_message}]
    iteration = 0

    while iteration < max_iterations:
        iteration += 1

        body = {
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": 4096,
            "tools": tools,
            "messages": messages
        }

        response = json.loads(client.invoke_model(
            modelId=MODEL_ID,
            contentType='application/json',
            accept='application/json',
            body=json.dumps(body)
        )['body'].read())

        stop_reason = response['stop_reason']

        # Always add assistant response to history
        messages.append({"role": "assistant", "content": response['content']})

        if stop_reason == 'end_turn':
            # Extract final text
            for block in response['content']:
                if block['type'] == 'text':
                    return block['text']
            return ""

        elif stop_reason == 'tool_use':
            # Execute all tool calls
            tool_results = []
            for block in response['content']:
                if block['type'] == 'tool_use':
                    print(f"  Calling tool: {block['name']}({block['input']})")
                    result = dispatch_tool(block['name'], block['input'])
                    print(f"  Result: {result}")
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block['id'],
                        "content": json.dumps(result)
                    })

            # Add tool results as user message
            messages.append({"role": "user", "content": tool_results})

        else:
            print(f"Unexpected stop_reason: {stop_reason}")
            break

    return "Max iterations reached"

# Test
result = run_agent("What is AAPL's current stock price? And what's 15% of that price?")
print(result)

tool_choice Parameter:

# Auto (default): Claude decides when to call tools
body["tool_choice"] = {"type": "auto"}

# Any: Claude must call at least one tool
body["tool_choice"] = {"type": "any"}

# Force specific tool
body["tool_choice"] = {"type": "tool", "name": "get_weather"}

# Disable tools entirely (even if defined)
body["tool_choice"] = {"type": "none"}

Parallel Tool Calls (Multiple Tools, Lesson 6):

# Claude may return MULTIPLE tool_use blocks in one response
# This means Claude wants to call multiple tools in parallel
# You must execute ALL of them and return ALL results before continuing

def handle_parallel_tools(response_content, dispatch_fn):
    """Handle multiple parallel tool calls."""
    tool_results = []

    # Collect all tool_use blocks
    tool_calls = [b for b in response_content if b['type'] == 'tool_use']

    if not tool_calls:
        return None

    # Execute all in parallel
    import concurrent.futures
    with concurrent.futures.ThreadPoolExecutor() as executor:
        futures = {
            executor.submit(dispatch_fn, tc['name'], tc['input']): tc['id']
            for tc in tool_calls
        }
        for future, tool_use_id in futures.items():
            result = future.result()
            tool_results.append({
                "type": "tool_result",
                "tool_use_id": tool_use_id,
                "content": json.dumps(result)
            })

    return {"role": "user", "content": tool_results}

Structured Extraction via Tools (Lesson 8):

# Using tools for structured extraction is more reliable than JSON parsing
# The tool_use block guarantees valid structure matching the schema

extraction_tool = {
    "name": "extract_invoice",
    "description": "Extract invoice data from the provided text",
    "input_schema": {
        "type": "object",
        "properties": {
            "invoice_number": {"type": "string"},
            "vendor_name": {"type": "string"},
            "total_amount": {"type": "number"},
            "due_date": {"type": "string", "format": "date"},
            "line_items": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "description": {"type": "string"},
                        "quantity": {"type": "number"},
                        "unit_price": {"type": "number"}
                    }
                }
            }
        },
        "required": ["invoice_number", "vendor_name", "total_amount"]
    }
}

body = {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 1024,
    "tools": [extraction_tool],
    "tool_choice": {"type": "tool", "name": "extract_invoice"},  # Force it
    "messages": [{"role": "user", "content": f"Extract data from: {invoice_text}"}]
}

response = json.loads(client.invoke_model(
    modelId=MODEL_ID, contentType='application/json',
    accept='application/json', body=json.dumps(body)
)['body'].read())

# The extracted data is guaranteed to match the schema
extracted = response['content'][0]['input']

Bedrock Batch Inference (Lesson 7):

# For high-volume, non-time-sensitive workloads
# Input: S3 JSONL file (one request per line)
# Output: S3 JSONL file (one result per line, same order)

import boto3, json

bedrock = boto3.client('bedrock', region_name='us-east-1')

# Create batch job
job = bedrock.create_model_invocation_job(
    jobName='my-batch-job-001',
    modelId='anthropic.claude-3-haiku-20240307-v1:0',
    inputDataConfig={
        "s3InputDataConfig": {
            "s3Uri": "s3://my-bucket/batch-input/requests.jsonl",
            "s3InputFormat": "JSONL"
        }
    },
    outputDataConfig={
        "s3OutputDataConfig": {
            "s3Uri": "s3://my-bucket/batch-output/"
        }
    },
    roleArn="arn:aws:iam::123456789:role/BedrockBatchRole"
)

job_arn = job['jobArn']

# Poll for completion
import time
while True:
    status = bedrock.get_model_invocation_job(jobIdentifier=job_arn)
    state = status['status']
    if state == 'Completed':
        break
    elif state in ('Failed', 'Stopped'):
        raise RuntimeError(f"Batch job failed: {state}")
    print(f"Job status: {state}")
    time.sleep(30)

Best Practices

Write clear tool descriptions — Claude uses descriptions to decide when and how to call tools
Handle stop_reason="tool_use" in a loop until stop_reason="end_turn"
Validate tool inputs before executing (treat model output as untrusted)
Return structured errors with "is_error": true so Claude can adapt gracefully
Use parallel execution when Claude returns multiple tool_use blocks simultaneously
Use tool_choice={"type": "tool", "name": "..."} for guaranteed extraction use cases
Set a max iteration limit to prevent infinite tool loops

MODULE 6: RAG (Retrieval-Augmented Generation)

Lesson Coverage

RAG introduction and architecture
Chunking strategies
Embeddings (Amazon Titan, Cohere on Bedrock)
Full RAG workflow (index + query)
BM25 keyword search
Multi-search pipelines (hybrid)
Reranking
Contextual retrieval (Anthropic technique)

Key Notes

RAG Architecture:

INDEXING PIPELINE (offline):
Documents → Chunker → [chunk1, chunk2, ...] → Embedder → Vector Store

QUERY PIPELINE (online):
Query → Embed Query → Vector Search ─┐
Query ─────────────── BM25 Search  ──┤ → Merge/Rerank → Top-K → Claude → Answer
                                      └─ Hybrid RRF

Lesson 2: Chunking Strategies:

Strategy	Description	When to Use
Fixed-size	Split every N chars/tokens	Simple, consistent corpus
Sentence	Split at sentence boundaries	Better context preservation
Paragraph	Split at blank lines	Long-form documents
Semantic	Split at section headings	Structured docs (manuals, wikis)
Recursive	Try paragraph, then sentence, then fixed	General purpose best practice
Overlapping	Slide window with N-token overlap	Prevent boundary cut-offs

def chunk_text_recursive(text, max_chunk_size=500, overlap=50):
    """Recursive character splitting with overlap."""
    separators = ["\n\n", "\n", ". ", " ", ""]
    chunks = []

    def split(text, separators):
        if len(text) <= max_chunk_size:
            return [text]

        for sep in separators:
            if sep in text:
                parts = text.split(sep)
                current = ""
                result = []
                for part in parts:
                    candidate = current + sep + part if current else part
                    if len(candidate) <= max_chunk_size:
                        current = candidate
                    else:
                        if current:
                            result.append(current)
                        current = part
                if current:
                    result.append(current)
                return result
        return [text[:max_chunk_size]]

    raw_chunks = split(text, separators)

    # Add overlap
    for i, chunk in enumerate(raw_chunks):
        if i > 0 and overlap > 0:
            prev_end = raw_chunks[i-1][-overlap:]
            chunk = prev_end + " " + chunk
        chunks.append(chunk.strip())

    return chunks

Lesson 3: Embeddings on Bedrock:

import boto3, json
import numpy as np

bedrock_client = boto3.client('bedrock-runtime', region_name='us-east-1')

def embed_text(text, model_id='amazon.titan-embed-text-v2:0'):
    """Generate embedding vector for text using Amazon Titan."""
    body = {
        "inputText": text,
        "dimensions": 1024,      # 256, 512, or 1024 (Titan V2)
        "normalize": True        # Normalize to unit vector for cosine similarity
    }
    response = bedrock_client.invoke_model(
        modelId=model_id,
        contentType='application/json',
        accept='application/json',
        body=json.dumps(body)
    )
    result = json.loads(response['body'].read())
    return np.array(result['embedding'])

def embed_batch(texts, model_id='amazon.titan-embed-text-v2:0'):
    """Embed multiple texts (sequential for Titan — no batch endpoint)."""
    return [embed_text(t, model_id) for t in texts]

# Cohere Embed on Bedrock (better multilingual)
def embed_cohere(text, input_type='search_document'):
    body = {
        "texts": [text],
        "input_type": input_type,   # 'search_document' for indexing, 'search_query' for queries
        "truncate": "END"
    }
    response = bedrock_client.invoke_model(
        modelId='cohere.embed-multilingual-v3',
        contentType='application/json',
        accept='application/json',
        body=json.dumps(body)
    )
    return json.loads(response['body'].read())['embeddings'][0]

Embedding Models Comparison:

Model ID	Dims	Best For
`amazon.titan-embed-text-v2:0`	256/512/1024	General English, adjustable size
`amazon.titan-embed-text-v1:0`	1536	Legacy, fixed size
`cohere.embed-multilingual-v3`	1024	Multilingual content
`cohere.embed-english-v3`	1024	English, higher quality

Lesson 4: Full RAG Workflow (Simple In-Memory):

import numpy as np
from typing import List, Dict

class SimpleVectorStore:
    def __init__(self):
        self.chunks = []
        self.embeddings = []
        self.metadata = []

    def add(self, text: str, embedding: np.ndarray, metadata: Dict = None):
        self.chunks.append(text)
        self.embeddings.append(embedding)
        self.metadata.append(metadata or {})

    def search(self, query_embedding: np.ndarray, top_k: int = 5) -> List[Dict]:
        if not self.embeddings:
            return []

        # Cosine similarity
        matrix = np.array(self.embeddings)
        scores = matrix @ query_embedding  # Assumes normalized vectors
        top_indices = np.argsort(scores)[::-1][:top_k]

        return [
            {
                "text": self.chunks[i],
                "score": float(scores[i]),
                "metadata": self.metadata[i]
            }
            for i in top_indices
        ]

def build_rag_index(documents, chunk_size=500):
    """Index documents into vector store."""
    store = SimpleVectorStore()
    for doc in documents:
        chunks = chunk_text_recursive(doc['text'], chunk_size)
        for i, chunk in enumerate(chunks):
            embedding = embed_text(chunk)
            store.add(chunk, embedding, {
                "source": doc.get('title', 'unknown'),
                "chunk_index": i
            })
    return store

def rag_query(question, store, top_k=5):
    """Retrieve relevant chunks and generate answer."""
    # 1. Embed the question
    q_embedding = embed_text(question)

    # 2. Retrieve top-K chunks
    results = store.search(q_embedding, top_k=top_k)

    # 3. Build context
    context = "\n\n---\n\n".join([
        f"Source: {r['metadata'].get('source', 'unknown')}\n{r['text']}"
        for r in results
    ])

    # 4. Generate answer
    body = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 1024,
        "system": "Answer questions based only on the provided context. If the answer is not in the context, say so.",
        "messages": [{
            "role": "user",
            "content": f"Context:\n{context}\n\nQuestion: {question}"
        }]
    }

    response = bedrock_client.invoke_model(
        modelId='anthropic.claude-3-5-sonnet-20241022-v2:0',
        contentType='application/json',
        accept='application/json',
        body=json.dumps(body)
    )
    result = json.loads(response['body'].read())
    return result['content'][0]['text'], results

Lesson 5: BM25 Keyword Search:

# BM25 = Best Match 25 — probabilistic TF-IDF variant
# Great for exact keyword matches, product codes, names

# Install: pip install rank_bm25
from rank_bm25 import BM25Okapi
import re

def tokenize(text):
    return re.findall(r'\w+', text.lower())

class BM25Index:
    def __init__(self, documents):
        self.docs = documents
        tokenized = [tokenize(d['text']) for d in documents]
        self.bm25 = BM25Okapi(tokenized)

    def search(self, query, top_k=10):
        tokens = tokenize(query)
        scores = self.bm25.get_scores(tokens)
        top_indices = sorted(range(len(scores)), key=lambda i: scores[i], reverse=True)[:top_k]
        return [
            {"text": self.docs[i]['text'], "score": float(scores[i]), "metadata": self.docs[i].get('metadata', {})}
            for i in top_indices if scores[i] > 0
        ]

Lesson 6: Hybrid Search with Reciprocal Rank Fusion (RRF):

def reciprocal_rank_fusion(result_lists, k=60):
    """Merge multiple ranked result lists using RRF.
    k=60 is the standard constant — higher k = less weight to top ranks.
    """
    scores = {}  # doc_id -> RRF score

    for results in result_lists:
        for rank, result in enumerate(results):
            doc_id = result.get('id') or result['text'][:50]  # Use text prefix as id
            if doc_id not in scores:
                scores[doc_id] = {"score": 0.0, "data": result}
            scores[doc_id]["score"] += 1.0 / (k + rank + 1)

    return sorted(scores.values(), key=lambda x: x["score"], reverse=True)

def hybrid_search(query, vector_store, bm25_index, top_k=10, final_k=5):
    """Run vector + BM25 search and merge with RRF."""
    q_embedding = embed_text(query)

    vector_results = vector_store.search(q_embedding, top_k=top_k)
    bm25_results = bm25_index.search(query, top_k=top_k)

    # Add ID for merging
    for i, r in enumerate(vector_results):
        r['id'] = r['text'][:80]
    for i, r in enumerate(bm25_results):
        r['id'] = r['text'][:80]

    merged = reciprocal_rank_fusion([vector_results, bm25_results])
    return [m['data'] for m in merged[:final_k]]

Lesson 7: Reranking:

def rerank_with_bedrock(query, documents, top_k=5):
    """Rerank retrieved documents using Amazon Bedrock Reranker."""
    bedrock_agent = boto3.client('bedrock-agent-runtime', region_name='us-east-1')

    response = bedrock_agent.rerank(
        rerankingConfiguration={
            "type": "BEDROCK_RERANKING_MODEL",
            "bedrockRerankingConfiguration": {
                "modelConfiguration": {
                    "modelArn": "arn:aws:bedrock:us-east-1::foundation-model/amazon.rerank-v1:0"
                },
                "numberOfResults": top_k
            }
        },
        sources=[
            {
                "type": "INLINE",
                "inlineDocumentSource": {
                    "type": "TEXT",
                    "textDocument": {"text": doc['text']}
                }
            }
            for doc in documents
        ],
        textSources=[{"type": "QUERY", "query": {"text": query}}]
    )

    return [
        {
            "text": documents[item['index']]['text'],
            "rerank_score": item['relevanceScore'],
            "original_rank": item['index']
        }
        for item in sorted(response['rerankingResults'], key=lambda x: x['relevanceScore'], reverse=True)
    ]

Lesson 8: Contextual Retrieval (Anthropic Technique):

def add_chunk_context(chunk, full_document, model_id='anthropic.claude-3-haiku-20240307-v1:0'):
    """Prepend contextual summary to each chunk before embedding.
    Significantly improves retrieval accuracy by giving chunks document context.
    """
    prompt = f"""Here is a document:
<document>
{full_document[:3000]}  <!-- Truncate to avoid context overflow -->
</document>

Here is one chunk from this document:
<chunk>
{chunk}
</chunk>

Write a short 1-2 sentence context for this chunk that explains where it fits within the full document.
Return ONLY the context sentences, no preamble."""

    body = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 100,
        "temperature": 0.0,
        "messages": [{"role": "user", "content": prompt}]
    }
    response = bedrock_client.invoke_model(
        modelId=model_id,
        contentType='application/json',
        accept='application/json',
        body=json.dumps(body)
    )
    context = json.loads(response['body'].read())['content'][0]['text']
    return f"{context}\n\n{chunk}"   # Prepend context to chunk

# Bedrock Knowledge Bases (Managed RAG)
bedrock_agent_runtime = boto3.client('bedrock-agent-runtime', region_name='us-east-1')

def kb_retrieve_and_generate(query, knowledge_base_id, model_arn):
    """Use Bedrock Knowledge Bases for fully managed RAG."""
    response = bedrock_agent_runtime.retrieve_and_generate(
        input={"text": query},
        retrieveAndGenerateConfiguration={
            "type": "KNOWLEDGE_BASE",
            "knowledgeBaseConfiguration": {
                "knowledgeBaseId": knowledge_base_id,
                "modelArn": model_arn,
                "retrievalConfiguration": {
                    "vectorSearchConfiguration": {"numberOfResults": 10}
                }
            }
        }
    )
    return {
        "answer": response['output']['text'],
        "citations": response.get('citations', [])
    }

RAG Pipeline Decision Tree:

Do you want managed infrastructure?
  YES → Use Bedrock Knowledge Bases (S3 → auto-chunked → OpenSearch → managed query)
  NO  → Build custom pipeline:
           Do you have multilingual content? → Use Cohere Embed
           Do you have English-only content? → Use Amazon Titan Embed V2
           Do queries use exact keywords? → Add BM25 + hybrid search
           Do you have high retrieval quality requirements? → Add reranking
           Do you have large documents? → Use contextual retrieval

Best Practices

Retrieve more than needed (top-20 or top-50), then rerank to final top-5
Keep total context under 10K tokens to avoid attention degradation
Include source metadata (doc title, page, section) in retrieved chunks
Use hybrid search (vector + BM25) by default — pure vector misses exact terms
Apply contextual retrieval for large document corpora (10K+ chunks)
For production: Bedrock Knowledge Bases reduces ops overhead significantly
Test retrieval quality independently from generation quality

MODULE 7: Features of Claude

Lesson Coverage

Extended thinking (multi-step reasoning)
Image support (vision)
PDF support
Citations
Prompt caching rules and mechanics
Practical implementation

Key Notes

Lesson 1: Extended Thinking:

# Extended thinking allows Claude to reason before answering
# Adds "thinking" content blocks to the response
# Use for: math, logic puzzles, multi-step planning, code analysis

body = {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 16000,   # Must be > budget_tokens
    "thinking": {
        "type": "enabled",
        "budget_tokens": 10000   # Max tokens for reasoning (1024–32000)
    },
    "messages": [{"role": "user", "content": "Solve: A train leaves at 60mph..."}]
}

response = json.loads(client.invoke_model(
    modelId='anthropic.claude-3-5-sonnet-20241022-v2:0',
    contentType='application/json',
    accept='application/json',
    body=json.dumps(body)
)['body'].read())

for block in response['content']:
    if block['type'] == 'thinking':
        print("REASONING:", block['thinking'])   # Claude's internal reasoning
    elif block['type'] == 'text':
        print("ANSWER:", block['text'])           # Final answer

Extended Thinking Rules:

max_tokens must exceed budget_tokens (max_tokens = budget_tokens + expected_answer_tokens)
Temperature is fixed at 1 when thinking is enabled (cannot customize)
Cannot use top_p or top_k with thinking
Streaming works with thinking — you get thinking deltas in stream
Model decides how much of the budget to actually use

Lesson 2: Vision (Image Input):

import base64

def invoke_with_image(image_path, text_prompt, media_type='image/png'):
    with open(image_path, 'rb') as f:
        image_data = base64.b64encode(f.read()).decode('utf-8')

    body = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 1024,
        "messages": [{
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": media_type,   # image/jpeg, image/png, image/gif, image/webp
                        "data": image_data
                    }
                },
                {
                    "type": "text",
                    "text": text_prompt
                }
            ]
        }]
    }

    response = client.invoke_model(
        modelId='anthropic.claude-3-5-sonnet-20241022-v2:0',
        contentType='application/json',
        accept='application/json',
        body=json.dumps(body)
    )
    return json.loads(response['body'].read())['content'][0]['text']

# Multiple images in one request
def compare_images(image_path1, image_path2):
    def load_image(path):
        with open(path, 'rb') as f:
            return base64.b64encode(f.read()).decode('utf-8')

    body = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 1024,
        "messages": [{
            "role": "user",
            "content": [
                {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": load_image(image_path1)}},
                {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": load_image(image_path2)}},
                {"type": "text", "text": "Compare these two images. What are the main differences?"}
            ]
        }]
    }
    # ... invoke_model ...

Vision Limits on Bedrock:

Max 5MB per image
Max 20 images per request
Supported: JPEG, PNG, GIF, WebP
Each image consumes input tokens (cost varies by resolution)
Images are not stored — send fresh each request (no caching of images)

Lesson 3: PDF Support:

import base64

def invoke_with_pdf(pdf_path, question):
    with open(pdf_path, 'rb') as f:
        pdf_data = base64.b64encode(f.read()).decode('utf-8')

    body = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 2048,
        "messages": [{
            "role": "user",
            "content": [
                {
                    "type": "document",
                    "source": {
                        "type": "base64",
                        "media_type": "application/pdf",
                        "data": pdf_data
                    }
                },
                {
                    "type": "text",
                    "text": question
                }
            ]
        }]
    }

    response = client.invoke_model(
        modelId='anthropic.claude-3-5-sonnet-20241022-v2:0',
        contentType='application/json',
        accept='application/json',
        body=json.dumps(body)
    )
    return json.loads(response['body'].read())['content'][0]['text']

answer = invoke_with_pdf('quarterly_report.pdf', 'What was the total revenue in Q3?')

Lesson 4: Citations:

# Citations make Claude reference the documents it used in its answer
# Requires documents to be passed as document blocks (not plain text in content)

def rag_with_citations(question, documents):
    content = []

    # Add each document as a document block
    for doc in documents:
        content.append({
            "type": "document",
            "source": {
                "type": "text",
                "media_type": "text/plain",
                "data": doc['text']
            },
            "title": doc.get('title', 'Document'),
            "context": doc.get('source', ''),   # Optional source context
            "citations": {"enabled": True}
        })

    content.append({"type": "text", "text": question})

    body = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 2048,
        "messages": [{"role": "user", "content": content}]
    }

    response = client.invoke_model(
        modelId='anthropic.claude-3-5-sonnet-20241022-v2:0',
        contentType='application/json',
        accept='application/json',
        body=json.dumps(body)
    )

    result = json.loads(response['body'].read())

    # Parse citations from response
    for block in result['content']:
        if block['type'] == 'text':
            print("Answer:", block['text'])
        # Citation blocks reference document indices

Lesson 5 & 6: Prompt Caching:

# Prompt caching saves cost for repeated requests with identical prefix
# Cache TTL: 5 minutes (ephemeral) — must resend within 5 min to get cache hit
# Cache write: +25% cost on first write
# Cache read: -90% cost (only 10% of normal input price)

# CACHE THE SYSTEM PROMPT (most common use case)
body = {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 1024,
    "system": [
        {
            "type": "text",
            "text": "<long static instructions, documents, examples...>",
            "cache_control": {"type": "ephemeral"}   # Cache this prefix
        }
    ],
    "messages": [{"role": "user", "content": "Question 1"}]
}

# CACHE MID-CONVERSATION (large document in first user message)
body = {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 1024,
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "<very long document to analyze>",
                    "cache_control": {"type": "ephemeral"}   # Cache doc
                },
                {
                    "type": "text",
                    "text": "What is the main theme?"   # Dynamic question
                }
            ]
        }
    ]
}

# Check cache usage in response
result = json.loads(response['body'].read())
usage = result['usage']
# usage keys: input_tokens, output_tokens,
#             cache_creation_input_tokens (tokens written to cache)
#             cache_read_input_tokens (tokens read from cache)

Cache Placement Rules:

System prompt cache → put cache_control on last system text block
Message cache → put cache_control on the last static content block in messages
Multiple cache points → up to 4 simultaneous cache breakpoints
Cache key = everything BEFORE the cache_control marker (must be identical to hit)

When to Cache:

Large static system prompts (>1K tokens) reused across requests
Long documents analyzed multiple times (PDF, transcript, codebase)
Few-shot example banks shared across thousands of requests
Tool definitions that rarely change

Best Practices

Use extended thinking only for complex reasoning tasks — it adds latency and cost
Cache static content that is reused across multiple requests in the same 5-minute window
Resize images before sending — each image costs tokens based on resolution (smaller = cheaper)
Enable citations when building RAG apps where source attribution matters for trust
Test prompt caching by checking cache_read_input_tokens in response usage
PDFs count all pages toward context window — split very large PDFs for long documents

MODULE 8: Model Context Protocol (MCP)

Lesson Coverage

MCP introduction and motivation
MCP client setup
Project configuration (claude_desktop_config.json)
Tool definition in MCP servers
Server inspection with MCP Inspector
Client implementation (connecting to MCP server)
MCP Resources (data exposure)
MCP Prompts (templated interactions)
Review and patterns

Key Notes

MCP Overview:

Problem: Every app needs to build its own tool integrations
Solution: MCP standardizes how AI apps connect to external tools/data

MCP Architecture:
┌──────────────┐    MCP Protocol    ┌──────────────┐    External APIs
│  MCP Client  │◄──────────────────►│  MCP Server  │◄──────────────►
│ (Claude app) │    stdio or HTTP   │  (your code) │    Databases
└──────────────┘                    └──────────────┘    File system

Three MCP Primitives:

Primitive	Purpose	Analogous To
Tools	Functions Claude can call	Tool use / function calling
Resources	Data/files Claude can read	Context injection
Prompts	Reusable prompt templates	Prompt library

Lesson 4: Building an MCP Server with FastMCP:

# pip install mcp fastmcp
from mcp.server.fastmcp import FastMCP
import json
import requests

mcp = FastMCP("bedrock-tools-server")

# --- TOOLS ---
@mcp.tool()
def get_weather(city: str, unit: str = "celsius") -> str:
    """Get current weather for a city.

    Args:
        city: City name (e.g. 'New York', 'London')
        unit: Temperature unit — 'celsius' or 'fahrenheit'
    """
    # In real code, call a weather API
    return json.dumps({
        "city": city,
        "temperature": 22,
        "unit": unit,
        "condition": "partly cloudy"
    })

@mcp.tool()
def search_docs(query: str, limit: int = 5) -> str:
    """Search internal documentation.

    Args:
        query: Search terms
        limit: Maximum results to return (1-20)
    """
    # Connect to your actual search backend
    results = [{"title": f"Result {i}", "snippet": "..."} for i in range(limit)]
    return json.dumps(results)

@mcp.tool()
def run_sql(query: str) -> str:
    """Execute a read-only SQL query on the analytics database.

    Args:
        query: SQL SELECT statement (no mutations allowed)
    """
    if not query.strip().upper().startswith("SELECT"):
        return json.dumps({"error": "Only SELECT queries are allowed"})
    # Execute query...
    return json.dumps({"columns": ["col1"], "rows": [[1, 2, 3]]})

# --- RESOURCES ---
@mcp.resource("config://app-settings")
def get_app_settings() -> str:
    """Current application configuration settings."""
    return json.dumps({
        "version": "2.1.0",
        "environment": "production",
        "features": {"rag": True, "caching": True}
    })

@mcp.resource("docs://readme")
def get_readme() -> str:
    """Project README file contents."""
    with open("README.md") as f:
        return f.read()

# --- PROMPTS ---
@mcp.prompt()
def analyze_code(language: str, code: str) -> str:
    """Generate a code analysis prompt."""
    return f"""Analyze this {language} code for:
1. Bugs and potential errors
2. Security vulnerabilities
3. Performance issues
4. Code style and readability

Code:
```{language}
{code}

Provide specific, actionable feedback."""

if name == “main”: mcp.run() # Starts stdio server by default


**Lesson 2 & 3: Client Setup and Configuration:**
```json
// claude_desktop_config.json — for Claude Desktop
{
  "mcpServers": {
    "bedrock-tools": {
      "command": "python",
      "args": ["/path/to/my-mcp-server.py"],
      "env": {
        "AWS_DEFAULT_REGION": "us-east-1"
      }
    },
    "filesystem": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/allowed/path"]
    }
  }
}

Transport Types:

Transport	How	Use Case
stdio	Child process stdin/stdout	Local servers (most common)
HTTP/SSE	Server-Sent Events over HTTP	Remote servers, shared infrastructure
WebSocket	Bidirectional WS	Real-time bidirectional

Lesson 5: MCP Inspector:

# Install and run MCP Inspector
npx @modelcontextprotocol/inspector python my-server.py

# Inspector shows:
# - All tools with their schemas
# - All resources with their URIs
# - All prompts with their templates
# - Interactive tool call tester
# - Server logs in real time

Lesson 6: Custom MCP Client:

import asyncio
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client

async def run_mcp_client():
    """Connect to an MCP server and call its tools."""
    server_params = StdioServerParameters(
        command="python",
        args=["my-mcp-server.py"]
    )

    async with stdio_client(server_params) as (read, write):
        async with ClientSession(read, write) as session:
            # Initialize connection
            await session.initialize()

            # List available tools
            tools_response = await session.list_tools()
            print("Available tools:")
            for tool in tools_response.tools:
                print(f"  - {tool.name}: {tool.description}")

            # List resources
            resources_response = await session.list_resources()
            for resource in resources_response.resources:
                print(f"  Resource: {resource.uri}")

            # Call a tool
            result = await session.call_tool("get_weather", {"city": "Seattle"})
            print("Weather result:", result.content[0].text)

            # Read a resource
            resource_content = await session.read_resource("config://app-settings")
            print("Config:", resource_content.contents[0].text)

asyncio.run(run_mcp_client())

Integrating MCP with Bedrock Claude:

# Pattern: Get tool schemas from MCP, pass to Bedrock, route calls back to MCP

async def bedrock_with_mcp(user_message, mcp_session):
    """Use MCP tools with Claude on Bedrock."""
    # Get tools from MCP server
    tools_list = await mcp_session.list_tools()

    # Convert MCP tool schemas to Bedrock tool format
    bedrock_tools = [
        {
            "name": tool.name,
            "description": tool.description,
            "input_schema": tool.inputSchema
        }
        for tool in tools_list.tools
    ]

    messages = [{"role": "user", "content": user_message}]

    while True:
        body = {
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": 4096,
            "tools": bedrock_tools,
            "messages": messages
        }

        response = json.loads(client.invoke_model(
            modelId='anthropic.claude-3-5-sonnet-20241022-v2:0',
            contentType='application/json',
            accept='application/json',
            body=json.dumps(body)
        )['body'].read())

        messages.append({"role": "assistant", "content": response['content']})

        if response['stop_reason'] == 'end_turn':
            for block in response['content']:
                if block['type'] == 'text':
                    return block['text']

        tool_results = []
        for block in response['content']:
            if block['type'] == 'tool_use':
                # Route call through MCP
                mcp_result = await mcp_session.call_tool(block['name'], block['input'])
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block['id'],
                    "content": mcp_result.content[0].text
                })

        messages.append({"role": "user", "content": tool_results})

Best Practices

Name tools and resources clearly — Claude uses names and descriptions to choose them
Validate all inputs in tool functions — Claude can send unexpected values
Return structured error JSON (don’t raise unhandled exceptions) so Claude can adapt
Use MCP for shared tool servers reused across multiple Claude applications
Test with MCP Inspector before connecting to any Claude application
Use stdio transport for local development, HTTP/SSE for production shared servers
Keep tool descriptions under 200 characters — concise beats verbose

MODULE 9: Agents

Lesson Coverage

Agent overview and design patterns
Claude Code setup and actions
MCP enhancements for agents
Parallelization patterns
Automated debugging agents
Computer Use
Agent quality principles

Key Notes

What Makes an Agent:

An agent = LLM + tools + memory + orchestration loop
                                ↓
              (runs tool → gets result → decides next action → repeats)

Lesson 1: Agent Patterns:

1. PARALLELIZATION
   Orchestrator ─→ [Task A]  ─┐
                 ─→ [Task B]  ─┤→ Aggregate → Final Answer
                 ─→ [Task C]  ─┘

2. SEQUENTIAL CHAINING
   Input → [Agent 1: Draft] → [Agent 2: Review] → [Agent 3: Refine] → Output

3. ROUTING
   Input → [Router: Classify] → BILLING → [Billing Specialist]
                              → TECH    → [Tech Support Specialist]
                              → RETURNS → [Returns Specialist]

4. ORCHESTRATOR-SUBAGENT
   [Orchestrator Agent]
        ├→ [Research Subagent]
        ├→ [Writing Subagent]
        └→ [Fact-Check Subagent]

Lesson 4: Parallelization Pattern:

import concurrent.futures
import boto3, json

client = boto3.client('bedrock-runtime', region_name='us-east-1')
MODEL_ID = 'anthropic.claude-3-5-sonnet-20241022-v2:0'

def run_claude(system, user_message, max_tokens=2048):
    """Single Claude call."""
    body = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": max_tokens,
        "system": system,
        "messages": [{"role": "user", "content": user_message}]
    }
    response = client.invoke_model(
        modelId=MODEL_ID,
        contentType='application/json',
        accept='application/json',
        body=json.dumps(body)
    )
    return json.loads(response['body'].read())['content'][0]['text']

def parallel_research(topic, aspects):
    """Research multiple aspects of a topic in parallel."""
    def research_aspect(aspect):
        system = f"You are a research expert. Focus only on: {aspect}"
        return {
            "aspect": aspect,
            "findings": run_claude(system, f"Research: {topic} - specifically: {aspect}")
        }

    with concurrent.futures.ThreadPoolExecutor(max_workers=len(aspects)) as executor:
        futures = [executor.submit(research_aspect, a) for a in aspects]
        results = [f.result() for f in futures]

    # Synthesize in orchestrator
    synthesis_input = "\n\n".join([
        f"## {r['aspect']}\n{r['findings']}" for r in results
    ])

    return run_claude(
        "You synthesize research into coherent summaries.",
        f"Synthesize these research findings about {topic}:\n\n{synthesis_input}"
    )

# Usage
summary = parallel_research("quantum computing", [
    "current hardware state",
    "software ecosystems",
    "commercial applications",
    "timeline predictions"
])

Lesson 3: Routing Pattern:

ROUTE_SYSTEM_PROMPT = """Classify the customer request into exactly one category:
- BILLING: Payment, invoice, subscription, refund questions
- TECHNICAL: Bug reports, errors, how-to questions
- RETURNS: Return, exchange, or refund of physical products
- GENERAL: Everything else

Respond with ONLY the category name."""

SPECIALIST_SYSTEMS = {
    "BILLING": "You are a billing specialist. Help customers with payment and subscription issues.",
    "TECHNICAL": "You are a technical support engineer. Diagnose and resolve software issues.",
    "RETURNS": "You are a returns specialist. Help customers with returns and exchanges.",
    "GENERAL": "You are a helpful customer service agent."
}

def route_and_respond(customer_message):
    # Step 1: Route
    route = run_claude(ROUTE_SYSTEM_PROMPT, customer_message, max_tokens=20).strip()

    # Step 2: Respond with specialist
    specialist_system = SPECIALIST_SYSTEMS.get(route, SPECIALIST_SYSTEMS["GENERAL"])
    response = run_claude(specialist_system, customer_message)

    return {"route": route, "response": response}

Lesson 2: Claude Code Setup:

# Install Claude Code globally
npm install -g @anthropic-ai/claude-code

# Configure to use Bedrock (instead of direct Anthropic API)
export ANTHROPIC_API_KEY=""  # Not needed for Bedrock
export CLAUDE_CODE_USE_BEDROCK=1
export AWS_REGION=us-east-1

# Or configure in ~/.claude/settings.json for persistent Bedrock use
# Launch Claude Code
claude

Bedrock Agents (Managed Service):

import boto3

bedrock_agent = boto3.client('bedrock-agent-runtime', region_name='us-east-1')

def invoke_bedrock_agent(agent_id, agent_alias_id, session_id, user_message):
    """Invoke a fully managed Bedrock Agent."""
    response = bedrock_agent.invoke_agent(
        agentId=agent_id,
        agentAliasId=agent_alias_id,
        sessionId=session_id,          # Maintain session for multi-turn
        inputText=user_message,
        enableTrace=True               # Get orchestration trace for debugging
    )

    # Response is a streaming body
    full_response = ""
    traces = []

    for event in response['completion']:
        if 'chunk' in event:
            chunk = event['chunk']
            if 'bytes' in chunk:
                full_response += chunk['bytes'].decode('utf-8')
        elif 'trace' in event:
            traces.append(event['trace'])

    return {"response": full_response, "traces": traces}

Lesson 5: Automated Debugging Agent:

def debugging_agent(error_message, codebase_context, max_iterations=5):
    """Agent that iteratively diagnoses and proposes fixes."""
    tools = [
        {
            "name": "search_codebase",
            "description": "Search codebase for relevant code",
            "input_schema": {
                "type": "object",
                "properties": {
                    "query": {"type": "string"},
                    "file_pattern": {"type": "string", "description": "Optional glob pattern"}
                },
                "required": ["query"]
            }
        },
        {
            "name": "read_file",
            "description": "Read contents of a specific file",
            "input_schema": {
                "type": "object",
                "properties": {"path": {"type": "string"}},
                "required": ["path"]
            }
        },
        {
            "name": "run_test",
            "description": "Run specific test to verify fix",
            "input_schema": {
                "type": "object",
                "properties": {"test_name": {"type": "string"}},
                "required": ["test_name"]
            }
        }
    ]

    system = """You are a debugging expert. Analyze errors systematically:
1. Understand the error message
2. Search for relevant code
3. Form a hypothesis
4. Verify with tests
5. Propose a concrete fix"""

    messages = [{
        "role": "user",
        "content": f"Error: {error_message}\n\nContext: {codebase_context}"
    }]

    iteration = 0
    while iteration < max_iterations:
        iteration += 1
        body = {
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": 4096,
            "system": system,
            "tools": tools,
            "messages": messages
        }

        response = json.loads(client.invoke_model(
            modelId=MODEL_ID, contentType='application/json',
            accept='application/json', body=json.dumps(body)
        )['body'].read())

        messages.append({"role": "assistant", "content": response['content']})

        if response['stop_reason'] == 'end_turn':
            return response['content'][0]['text']

        tool_results = []
        for block in response['content']:
            if block['type'] == 'tool_use':
                result = dispatch_debug_tool(block['name'], block['input'])
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block['id'],
                    "content": json.dumps(result)
                })
        messages.append({"role": "user", "content": tool_results})

    return "Debugging agent reached max iterations"

Lesson 6: Computer Use:

# Computer Use tools for desktop automation
# Available on claude-3-5-sonnet via Bedrock
# ALWAYS use in sandboxed/VM environment — never on production machines

computer_use_tools = [
    {
        "type": "computer_20241022",
        "name": "computer",
        "display_width_px": 1920,
        "display_height_px": 1080,
        "display_number": 1
    },
    {
        "type": "text_editor_20241022",
        "name": "str_replace_editor"
    },
    {
        "type": "bash_20241022",
        "name": "bash"
    }
]

body = {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 4096,
    "tools": computer_use_tools,
    "messages": [{
        "role": "user",
        "content": "Take a screenshot and describe what's on the screen."
    }]
}

# Computer Use responses include tool_use blocks with actions like:
# {"type": "screenshot"}
# {"type": "mouse_move", "coordinate": [x, y]}
# {"type": "left_click", "coordinate": [x, y]}
# {"type": "type", "text": "Hello World"}
# {"type": "key", "text": "ctrl+c"}

Lesson 7: Agent Quality Principles:

Good Agent Design:
✓ Bounded loops — always set max_iterations
✓ Audit logging — log every tool call and result
✓ Graceful error handling — tools return errors, agent adapts
✓ Human-in-the-loop — pause for approval before destructive actions
✓ Idempotency — tools should be safe to retry
✓ Minimal permissions — only grant access the agent actually needs
✓ Monitoring — alert on loops, failures, and unusual tool call patterns

Bad Agent Design:
✗ No iteration limit — can run forever on bad state
✗ No logging — can't debug what happened
✗ Silent failures — agent assumes success and continues incorrectly
✗ Unrestricted access — agent can delete/modify anything
✗ Trusting model output blindly — validate inputs before execution

Best Practices

Keep agent loops bounded — always set max iterations to prevent runaway behavior
Log every tool call and result for debugging and audit trails
Use Bedrock Agents for production managed orchestration with minimal code
Build custom loops only when you need full control over orchestration logic
Restrict Computer Use to isolated, sandboxed VMs — never use on production systems
Design tools to be idempotent where possible — safe to retry
Implement human-in-the-loop checkpoints for high-stakes actions (deletions, payments)
Use parallelization for independent subtasks to reduce total wall-clock time
Route to specialist agents/prompts for domain-specific quality improvement

MODULE 10: Final Assessment

Key Areas to Review

Must Know for Exam:

Bedrock client: boto3.client('bedrock-runtime', region_name='us-east-1')
Required field: "anthropic_version": "bedrock-2023-05-31" in every request body
Invoke methods: invoke_model() (sync) and invoke_model_with_response_stream() (streaming)
Response parsing: response['body'].read() → json.loads() → ['content'][0]['text']
Tool use message flow: tool_use block → execute → tool_result in user message
Tool result goes in a USER message, NOT an assistant message
Prompt caching: "cache_control": {"type": "ephemeral"} on content blocks
Cache TTL: 5 minutes
Extended thinking: "thinking": {"type": "enabled", "budget_tokens": N}
Cross-region IDs: prefix us. or eu. before anthropic.claude-...
IAM permission: bedrock:InvokeModel and bedrock:InvokeModelWithResponseStream

Common Exam Traps

Trap	Correct Answer
Forgetting `anthropic_version`	Always include `"bedrock-2023-05-31"`
Using raw model name in modelId	Must include full ID with version suffix: `anthropic.claude-3-5-sonnet-20241022-v2:0`
Assuming cross-region is automatic	Must explicitly use prefixed profile ID: `us.anthropic.claude-...`
Reading response body twice	`StreamingBody` can only be read once — call `.read()` once, store result
Putting tool_result in assistant message	Tool results go in `role: "user"` messages
Thinking `cache_control` is permanent	Cache TTL = 5 minutes (ephemeral only)
Setting temperature with extended thinking	Temperature is fixed at 1 when thinking is enabled
Assuming max_tokens = actual output length	It’s a ceiling, not a target
Using `anthropic` SDK with Bedrock	Use `boto3` — not the Anthropic Python library

Quick Reference: Request Template

import boto3, json

client = boto3.client('bedrock-runtime', region_name='us-east-1')

body = {
    "anthropic_version": "bedrock-2023-05-31",   # Required
    "max_tokens": 1024,                            # Required
    "system": "...",                               # Optional
    "messages": [
        {"role": "user", "content": "..."}
    ],
    # Optional parameters:
    "temperature": 0.7,
    "top_p": 0.9,
    "stop_sequences": ["###"],
    "tools": [...],
    "tool_choice": {"type": "auto"},
    "thinking": {"type": "enabled", "budget_tokens": 5000},
    "stream": True  # Only for invoke_model_with_response_stream
}

response = client.invoke_model(
    modelId='anthropic.claude-3-5-sonnet-20241022-v2:0',
    contentType='application/json',
    accept='application/json',
    body=json.dumps(body)
)

result = json.loads(response['body'].read())
text = result['content'][0]['text']

Certification Readiness Checklist

MODULE 1 — Course Introduction:

Know all Claude model IDs available on Bedrock and their use cases
Understand difference between standard and cross-region inference profile IDs
Know how to enable model access in Bedrock console
Know required IAM permissions for Bedrock

MODULE 2 — Working with the API:

Can create boto3 bedrock-runtime client with proper credentials
Know request body structure (anthropic_version, max_tokens, messages)
Can parse invoke_model response (body.read → json.loads → content[0].text)
Can implement multi-turn conversation history management
Can implement streaming with invoke_model_with_response_stream
Know all streaming event types (message_start, content_block_delta, message_stop)
Know temperature, top_p, top_k, stop_sequences parameters
Can implement retry logic with exponential backoff

MODULE 3 — Prompt Evaluations:

Can design evaluation dataset (size, coverage, no leakage)
Know multiple grading approaches (exact match, regex, JSON schema, model-based)
Can implement model-based grader using cheap model (Haiku)
Can run parallel evaluations with ThreadPoolExecutor
Know key metrics: accuracy, F1, latency, cost per correct answer

MODULE 4 — Prompt Engineering:

Know XML tag structure for complex prompts
Can implement few-shot prompting with proper format matching
Know chain of thought patterns (thinking tags, step-by-step)
Know temperature guidance by task type
Can use stop sequences for output extraction

MODULE 5 — Tool Use:

Can define tools with JSON Schema (properties, types, required, enum)
Know tool use message flow (request → tool_use → tool_result → response)
Can implement complete tool loop (handle stop_reason=tool_use)
Know all tool_choice options (auto, any, tool, none)
Can handle parallel tool calls (multiple tool_use blocks)
Know Bedrock batch inference (create_model_invocation_job, S3 I/O)

MODULE 6 — RAG:

Know chunking strategies (fixed, recursive, overlapping) and tradeoffs
Can invoke Amazon Titan Embed V2 for embeddings
Can implement cosine similarity search
Can implement BM25 keyword search with rank_bm25
Can implement hybrid search with RRF
Know reranking with amazon.rerank-v1:0
Know Bedrock Knowledge Bases API (retrieve_and_generate)
Know contextual retrieval technique

MODULE 7 — Features of Claude:

Can enable extended thinking with budget_tokens
Know thinking block constraints (no temperature control, max_tokens requirement)
Can send images as base64 content blocks
Can send PDFs as base64 document blocks
Can enable citations in document-based requests
Can implement prompt caching with cache_control
Know cache TTL (5 min), write cost (+25%), read savings (-90%)

MODULE 8 — MCP:

Know three MCP primitives (Tools, Resources, Prompts)
Can build FastMCP server with @mcp.tool(), @mcp.resource(), @mcp.prompt()
Know stdio vs HTTP transport options
Can configure claude_desktop_config.json for MCP servers
Know how to use MCP Inspector for debugging
Can implement MCP client with ClientSession

MODULE 9 — Agents:

Know four agent patterns (parallelization, chaining, routing, orchestrator-subagent)
Can implement parallel subagents with ThreadPoolExecutor
Can implement routing pattern (classify → specialist)
Can invoke Bedrock Agents with bedrock-agent-runtime
Know Computer Use tools and safety requirements
Know agent quality principles (bounded loops, logging, human-in-the-loop)

MODULE 11: Course Wrap-Up

Summary

Core Difference: Bedrock vs Direct API

Aspect	Direct Anthropic API	Amazon Bedrock
Auth	`ANTHROPIC_API_KEY`	IAM roles/credentials
SDK	`anthropic` Python library	`boto3`
Client	`anthropic.Anthropic()`	`boto3.client('bedrock-runtime')`
Model IDs	`claude-3-5-sonnet-20241022`	`anthropic.claude-3-5-sonnet-20241022-v2:0`
Request method	`client.messages.create()`	`client.invoke_model()`
Body format	Named keyword args	JSON bytes in `body` param
Required extra field	None	`"anthropic_version": "bedrock-2023-05-31"`
Streaming	`.stream()` context manager	`invoke_model_with_response_stream()`
Billing	Anthropic invoice	AWS invoice
Managed RAG	None built-in	Bedrock Knowledge Bases
Managed Agents	None built-in	Bedrock Agents + Action Groups
Batch inference	Batch API (files API)	Model Invocation Jobs (S3 I/O)
Region control	Global endpoint	Per-region (must enable model access)
Cross-region	N/A	Inference profiles (`us.`, `eu.` prefix)
Guardrails	None built-in	Bedrock Guardrails
Compliance	Anthropic ToS	AWS compliance (HIPAA-eligible, SOC2)

Best Practices for Production on AWS:

Use IAM roles, not static credentials — least privilege, no rotation needed in code
Enable CloudWatch logging for Bedrock invocations (audit, cost tracking, debugging)
Use Bedrock Guardrails to filter harmful content at managed layer
Use cross-region inference profiles for higher availability
Cache system prompts and static documents to reduce cost on repetitive workloads
Use Bedrock Knowledge Bases for managed RAG (less ops); custom pipelines for full control
Monitor token usage with AWS Cost Explorer and set billing alerts
Pin model IDs (full ID including version) to prevent unexpected upgrades
Use cross-region inference profiles in production for resilience
Implement exponential backoff for ThrottlingException

Quick Comparison: Direct API vs Bedrock vs Vertex AI

Feature	Anthropic Direct	Amazon Bedrock	Google Vertex AI
Auth	API Key	AWS IAM	Google ADC / Service Account
SDK	`anthropic`	`boto3`	`anthropic` (AnthropicVertex)
Client init	`Anthropic()`	`boto3.client('bedrock-runtime')`	`AnthropicVertex(region, project)`
Model ID format	`claude-3-5-sonnet-20241022`	`anthropic.claude-3-5-sonnet-20241022-v2:0`	`claude-3-5-sonnet@20241022`
Extra required	None	`anthropic_version` in body	None
Streaming	`.stream()` CM	`invoke_model_with_response_stream()`	`.stream()` CM
Billing	Anthropic	AWS	Google Cloud
Managed RAG	None	Bedrock Knowledge Bases	Vertex AI Search
Managed Agents	None	Bedrock Agents	Vertex AI Agent Builder
Embeddings native	None	Amazon Titan Embeddings	Vertex AI Embeddings
Batch inference	Batch API (files)	Model Invocation Jobs (S3)	Batch prediction jobs
Guardrails	None built-in	Bedrock Guardrails	Vertex AI safety filters
Prompt caching	Yes	Yes	Yes
Extended thinking	Yes	Yes	Yes
Computer Use	Yes	Yes	Yes

Study tip: The most Bedrock-specific concepts that differ from other platforms are: 1. boto3 client setup with bedrock-runtime 2. JSON body encoding (not keyword args) with required anthropic_version field 3. Tool result placement in USER messages 4. Prompt cache TTL (5 minutes ephemeral) 5. Cross-region inference profile ID prefixes (us., eu.) 6. Model ID format including version suffix and anthropic. prefix

Claude with Amazon Bedrock - Certification Study Guide#

MODULE 1: Course Introduction#

Lesson Coverage#

Key Notes#

Best Practices#

MODULE 2: Working with the API#

Lesson Coverage#

Key Notes#

Best Practices#

MODULE 3: Prompt Evaluations#

Lesson Coverage#

Key Notes#

Best Practices#

MODULE 4: Prompt Engineering#

Lesson Coverage#

Key Notes#

Best Practices#

MODULE 5: Tool Use#

Lesson Coverage#

Key Notes#

Best Practices#

MODULE 6: RAG (Retrieval-Augmented Generation)#

Lesson Coverage#

Key Notes#

Best Practices#

MODULE 7: Features of Claude#

Lesson Coverage#

Key Notes#

Best Practices#

MODULE 8: Model Context Protocol (MCP)#

Lesson Coverage#

Key Notes#

Best Practices#

MODULE 9: Agents#

Lesson Coverage#

Key Notes#

Best Practices#

MODULE 10: Final Assessment#

Key Areas to Review#

Common Exam Traps#

Quick Reference: Request Template#

Certification Readiness Checklist#

MODULE 11: Course Wrap-Up#

Summary#

Quick Comparison: Direct API vs Bedrock vs Vertex AI#

Claude with Amazon Bedrock - Certification Study Guide

MODULE 1: Course Introduction

Lesson Coverage

Key Notes

Best Practices

MODULE 2: Working with the API

Lesson Coverage

Key Notes

Best Practices

MODULE 3: Prompt Evaluations

Lesson Coverage

Key Notes

Best Practices

MODULE 4: Prompt Engineering

Lesson Coverage

Key Notes

Best Practices

MODULE 5: Tool Use

Lesson Coverage

Key Notes

Best Practices

MODULE 6: RAG (Retrieval-Augmented Generation)

Lesson Coverage

Key Notes

Best Practices

MODULE 7: Features of Claude

Lesson Coverage

Key Notes

Best Practices

MODULE 8: Model Context Protocol (MCP)

Lesson Coverage

Key Notes

Best Practices

MODULE 9: Agents

Lesson Coverage

Key Notes

Best Practices

MODULE 10: Final Assessment

Key Areas to Review

Common Exam Traps

Quick Reference: Request Template

Certification Readiness Checklist

MODULE 11: Course Wrap-Up

Summary

Quick Comparison: Direct API vs Bedrock vs Vertex AI