← Claude Code & Certification

Claude with Amazon Bedrock - Certification Study Guide

Claude with Amazon Bedrock - Certification Study Guide

Course: Anthropic - Claude with Amazon Bedrock Modules: 11 (Introduction through Course Wrap-Up) Lessons: 40+ individual lessons Target: Certification Prep


MODULE 1: Course Introduction

Lesson Coverage

  • Introduction to the course structure and goals
  • Claude Models available on Amazon Bedrock

Key Notes

Course Scope:

  • Using Claude via AWS Bedrock — same models, different auth/SDK layer
  • All exercises use boto3, NOT the anthropic Python SDK
  • AWS handles billing, compliance (SOC2, HIPAA-eligible), and data residency

Claude Models Available on Bedrock:

Model IDVersionUse Case
anthropic.claude-3-5-sonnet-20241022-v2:0Sonnet 3.5 v2Best balance of speed and intelligence
anthropic.claude-3-5-haiku-20241022-v1:0Haiku 3.5Fast and cheap, high volume tasks
anthropic.claude-3-haiku-20240307-v1:0Haiku 3Older Haiku, still available
anthropic.claude-3-opus-20240229-v1:0Opus 3Most capable, slower

Cross-Region Inference Profile IDs (prefix us. or eu.):

  • us.anthropic.claude-3-5-sonnet-20241022-v2:0
  • us.anthropic.claude-3-5-haiku-20241022-v1:0
  • eu.anthropic.claude-3-5-sonnet-20241022-v2:0

Bedrock vs Direct API — At a Glance:

Direct Anthropic API          Amazon Bedrock
─────────────────────         ──────────────────────
Auth: API Key                 Auth: IAM roles/credentials
SDK:  anthropic               SDK:  boto3
URL:  api.anthropic.com       URL:  bedrock-runtime.<region>.amazonaws.com
Bill: Anthropic invoice       Bill: AWS invoice

Prerequisites for the Course:

  • Python 3.9+
  • AWS account with Bedrock enabled
  • IAM user or role with Bedrock permissions
  • Model access requested in Bedrock console (not automatic — must be manually enabled)

Enabling Model Access:

  1. Open AWS Console → Amazon Bedrock → Model access
  2. Click “Manage model access”
  3. Check Claude models → Submit request
  4. Wait for “Access granted” status (usually instant for Claude)

Required IAM Permissions:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel",
        "bedrock:InvokeModelWithResponseStream",
        "bedrock:CreateModelInvocationJob",
        "bedrock:GetModelInvocationJob",
        "bedrock:ListFoundationModels"
      ],
      "Resource": "*"
    }
  ]
}

Best Practices

  • Enable model access in the Bedrock console before running any code (access is not automatic)
  • Use IAM roles (not static credentials) in production — attach AmazonBedrockFullAccess or scoped policy
  • Choose region where Claude models are available: us-east-1, us-west-2, eu-west-3, ap-southeast-1
  • Pin model IDs including version suffix to prevent unexpected upgrades
  • Never hardcode AWS credentials — use environment variables or IAM instance profiles

MODULE 2: Working with the API

Lesson Coverage

  1. API access setup (boto3, credentials, regions)
  2. Creating your first request (request body structure)
  3. Multi-turn conversations (stateless history management)
  4. Chatbot exercise (building a REPL)
  5. System prompts (configuring model behavior)
  6. Temperature settings (controlling randomness)
  7. Streaming responses (real-time output)
  8. Output control (stop sequences, max_tokens, top_p, top_k)
  9. Structured data extraction (JSON output)

Key Notes

boto3 Setup and Authentication:

import boto3
import json

# Option 1: Explicit credentials (dev only — never in production code)
client = boto3.client(
    'bedrock-runtime',
    region_name='us-east-1',
    aws_access_key_id='AKIA...',
    aws_secret_access_key='...'
)

# Option 2: Environment variables (recommended for dev)
# AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_DEFAULT_REGION
client = boto3.client('bedrock-runtime', region_name='us-east-1')

# Option 3: IAM role (recommended for production / EC2 / Lambda)
# boto3 automatically picks up role credentials
client = boto3.client('bedrock-runtime', region_name='us-east-1')

# Option 4: Named profile from ~/.aws/credentials
session = boto3.Session(profile_name='my-bedrock-profile')
client = session.client('bedrock-runtime', region_name='us-east-1')

Core Bedrock-Runtime Methods:

MethodBehaviorUse Case
invoke_model()Synchronous — waits for full responseStandard requests
invoke_model_with_response_stream()Streaming — returns chunks progressivelyLong responses, UX
create_model_invocation_job()Async batch — S3 input/outputLarge-scale batch
get_model_invocation_job()Check batch job statusPolling batch jobs

Request Body Format (CRITICAL):

The request body is NOT keyword arguments.
It's a JSON string encoded as bytes, passed as the 'body' parameter.
The modelId is a SEPARATE parameter from the body.

Required structure:

body = {
    "anthropic_version": "bedrock-2023-05-31",  # REQUIRED — do not omit
    "max_tokens": 1024,                           # REQUIRED
    "messages": [                                 # REQUIRED
        {"role": "user", "content": "Hello"}
    ],
    # Optional fields:
    "system": "You are a helpful assistant.",
    "temperature": 0.7,
    "top_p": 0.9,
    "top_k": 250,
    "stop_sequences": ["###"]
}

response = client.invoke_model(
    modelId='anthropic.claude-3-5-sonnet-20241022-v2:0',
    contentType='application/json',
    accept='application/json',
    body=json.dumps(body)   # Must be bytes or string
)

Response Structure:

response = client.invoke_model(...)

# response is a dict with HTTP metadata + 'body' StreamingBody
raw = response['body'].read()          # Read all bytes
result = json.loads(raw)               # Parse JSON

# result structure:
# {
#   "id": "msg_01...",
#   "type": "message",
#   "role": "assistant",
#   "content": [{"type": "text", "text": "..."}],
#   "model": "claude-3-5-sonnet-20241022",
#   "stop_reason": "end_turn",
#   "stop_sequence": null,
#   "usage": {"input_tokens": 25, "output_tokens": 142}
# }

text = result['content'][0]['text']
input_tokens = result['usage']['input_tokens']
output_tokens = result['usage']['output_tokens']

Lesson 2: Basic Request Example

import boto3
import json

client = boto3.client('bedrock-runtime', region_name='us-east-1')
MODEL_ID = 'anthropic.claude-3-5-sonnet-20241022-v2:0'

def invoke_claude(prompt, system=None, max_tokens=1024, temperature=1.0):
    body = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": max_tokens,
        "messages": [{"role": "user", "content": prompt}]
    }
    if system:
        body["system"] = system
    if temperature != 1.0:
        body["temperature"] = temperature

    response = client.invoke_model(
        modelId=MODEL_ID,
        contentType='application/json',
        accept='application/json',
        body=json.dumps(body)
    )

    result = json.loads(response['body'].read())
    return result['content'][0]['text'], result['usage']

text, usage = invoke_claude("What is the capital of France?")
print(text)
print(f"Tokens used: {usage['input_tokens']} in, {usage['output_tokens']} out")

Lesson 3 & 4: Multi-Turn Conversations and Chatbot

import boto3, json

client = boto3.client('bedrock-runtime', region_name='us-east-1')
MODEL_ID = 'anthropic.claude-3-5-sonnet-20241022-v2:0'

class BedrockChatbot:
    def __init__(self, system_prompt=None):
        self.system = system_prompt
        self.history = []     # List of {"role": ..., "content": ...}
        self.total_tokens = {"input": 0, "output": 0}

    def chat(self, user_message, max_tokens=2048):
        self.history.append({"role": "user", "content": user_message})

        body = {
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": max_tokens,
            "messages": self.history
        }
        if self.system:
            body["system"] = self.system

        response = client.invoke_model(
            modelId=MODEL_ID,
            contentType='application/json',
            accept='application/json',
            body=json.dumps(body)
        )

        result = json.loads(response['body'].read())
        assistant_text = result['content'][0]['text']

        self.history.append({"role": "assistant", "content": assistant_text})
        self.total_tokens["input"] += result['usage']['input_tokens']
        self.total_tokens["output"] += result['usage']['output_tokens']

        return assistant_text

    def reset(self):
        self.history = []

# REPL chatbot
bot = BedrockChatbot(system_prompt="You are an expert Python tutor.")
print("Chatbot ready. Type 'quit' to exit, 'reset' to clear history.")
while True:
    user_input = input("You: ").strip()
    if user_input.lower() == 'quit':
        break
    if user_input.lower() == 'reset':
        bot.reset()
        print("History cleared.")
        continue
    response = bot.chat(user_input)
    print(f"Claude: {response}\n")
print(f"Total tokens: {bot.total_tokens}")

Lesson 5: System Prompts

# System prompts configure the model's persona, constraints, and behavior
# They are separate from the messages array — top-level "system" key

body = {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 1024,
    "system": """You are a senior AWS solutions architect.
Always recommend AWS-native services where possible.
Format responses with clear headings and bullet points.
If asked about pricing, always say to check the AWS Pricing Calculator.""",
    "messages": [
        {"role": "user", "content": "How should I deploy a Python API?"}
    ]
}

Lesson 6: Temperature Settings

# temperature: 0.0 to 1.0 (some models accept up to 2.0 via direct API, Bedrock caps at 1.0)
# 0.0 = deterministic, always same output (good for structured tasks)
# 1.0 = maximum diversity (good for creative tasks)

# Deterministic extraction
body_extract = {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 512,
    "temperature": 0.0,    # Fully deterministic
    "messages": [{"role": "user", "content": "Extract the date from: Meeting on Jan 15 2024"}]
}

# Creative writing
body_creative = {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 1024,
    "temperature": 1.0,    # Full creativity
    "messages": [{"role": "user", "content": "Write a haiku about cloud computing."}]
}

# top_p: nucleus sampling — consider top P% of probability mass (alternative to temperature)
# top_k: only consider top K tokens at each step
# Do not set both temperature and top_p — choose one

Lesson 7: Streaming Responses

import boto3, json

client = boto3.client('bedrock-runtime', region_name='us-east-1')
MODEL_ID = 'anthropic.claude-3-5-sonnet-20241022-v2:0'

def stream_response(prompt, system=None, max_tokens=2048):
    body = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": max_tokens,
        "messages": [{"role": "user", "content": prompt}]
    }
    if system:
        body["system"] = system

    response = client.invoke_model_with_response_stream(
        modelId=MODEL_ID,
        contentType='application/json',
        accept='application/json',
        body=json.dumps(body)
    )

    full_text = ""
    input_tokens = 0
    output_tokens = 0

    for event in response['body']:
        chunk_bytes = event['chunk']['bytes']
        chunk = json.loads(chunk_bytes.decode('utf-8'))

        event_type = chunk.get('type', '')

        if event_type == 'content_block_delta':
            # Text delta — print as it arrives
            delta_text = chunk['delta'].get('text', '')
            print(delta_text, end='', flush=True)
            full_text += delta_text

        elif event_type == 'message_delta':
            # Final stats
            output_tokens = chunk.get('usage', {}).get('output_tokens', 0)

        elif event_type == 'message_start':
            # Initial metadata including input token count
            input_tokens = chunk.get('message', {}).get('usage', {}).get('input_tokens', 0)

        elif event_type == 'message_stop':
            # Stream complete
            pass

    print()  # newline after stream
    return full_text, {"input_tokens": input_tokens, "output_tokens": output_tokens}

# Usage
text, usage = stream_response("Explain quantum entanglement in simple terms.")
print(f"\nUsage: {usage}")

Streaming Event Types Reference:

message_start       — first event, contains model + initial usage
content_block_start — begins a content block (text or tool_use)
content_block_delta — text or JSON delta within a block
content_block_stop  — ends a content block
message_delta       — stop_reason, stop_sequence, final output usage
message_stop        — stream complete (last event)

Lesson 8: Output Control

# max_tokens: hard cap on output length (required field)
# stop_sequences: list of strings that end generation when encountered
# top_p: cumulative probability for nucleus sampling
# top_k: limit token selection to top K candidates

body = {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 2048,
    "stop_sequences": ["</answer>", "Human:", "###"],
    "top_p": 0.95,
    "messages": [
        {"role": "user", "content": "Answer this: What is 2+2? Wrap in <answer> tags."}
    ]
}

# The model stops when it would generate a stop sequence
# result['stop_reason'] will be "stop_sequence" if that triggered the stop
# result['stop_sequence'] gives which sequence triggered it

Lesson 9: Structured Data Extraction

import json

def extract_structured_data(text, schema_description):
    """Extract structured data from unstructured text using Claude."""
    prompt = f"""Extract the following information from the text and return as JSON.

Schema: {schema_description}

Text:
{text}

Return ONLY valid JSON, no explanation."""

    body = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 1024,
        "temperature": 0.0,   # Deterministic for extraction
        "messages": [{"role": "user", "content": prompt}]
    }

    response = client.invoke_model(
        modelId=MODEL_ID,
        contentType='application/json',
        accept='application/json',
        body=json.dumps(body)
    )

    result = json.loads(response['body'].read())
    raw_text = result['content'][0]['text'].strip()

    # Strip markdown code fences if present
    if raw_text.startswith('```'):
        raw_text = raw_text.split('\n', 1)[1].rsplit('```', 1)[0].strip()

    return json.loads(raw_text)

# Example usage
invoice_text = """
Invoice #1234 from Acme Corp, dated March 15 2024.
Total amount: $2,450.00. Payment due April 15 2024.
Contact: billing@acme.com
"""

schema = '{"invoice_number": "string", "vendor": "string", "amount": "number", "due_date": "string"}'
data = extract_structured_data(invoice_text, schema)
print(data)
# {'invoice_number': '1234', 'vendor': 'Acme Corp', 'amount': 2450.00, 'due_date': '2024-04-15'}

Cross-Region Inference:

# Standard model ID — fixed to one region
MODEL_ID = 'anthropic.claude-3-5-sonnet-20241022-v2:0'

# Cross-region inference profile — routes across AZs within geography
# US profiles:
CROSS_REGION_US = 'us.anthropic.claude-3-5-sonnet-20241022-v2:0'

# EU profiles:
CROSS_REGION_EU = 'eu.anthropic.claude-3-5-sonnet-20241022-v2:0'

# When using cross-region, the client region still determines the entry point
client = boto3.client('bedrock-runtime', region_name='us-east-1')
# Then use CROSS_REGION_US as modelId — Bedrock routes intelligently

Error Handling:

import boto3
import json
from botocore.exceptions import ClientError
import time

def invoke_with_retry(client, model_id, body, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = client.invoke_model(
                modelId=model_id,
                contentType='application/json',
                accept='application/json',
                body=json.dumps(body)
            )
            return json.loads(response['body'].read())

        except ClientError as e:
            error_code = e.response['Error']['Code']

            if error_code == 'ThrottlingException':
                wait = (2 ** attempt) + 1  # Exponential backoff
                print(f"Throttled. Waiting {wait}s before retry {attempt+1}/{max_retries}")
                time.sleep(wait)
                continue

            elif error_code == 'ValidationException':
                raise ValueError(f"Invalid request: {e.response['Error']['Message']}")

            elif error_code == 'AccessDeniedException':
                raise PermissionError("Check IAM permissions and model access in Bedrock console")

            elif error_code == 'ModelNotReadyException':
                raise RuntimeError("Model not available — check Bedrock console model access status")

            else:
                raise

    raise RuntimeError(f"Max retries ({max_retries}) exceeded")

Best Practices

  • Always include "anthropic_version": "bedrock-2023-05-31" in request body (required by Bedrock)
  • Always call response['body'].read() before parsing — StreamingBody can only be read once
  • Handle ThrottlingException with exponential backoff — Bedrock has per-account TPS limits
  • Stream for responses >500 tokens to improve perceived latency
  • Use cross-region inference profiles in production for higher availability
  • Temperature 0.0 for extraction/classification, 0.7-1.0 for generation
  • Set max_tokens conservatively for cost control — it’s a cap, not a target

MODULE 3: Prompt Evaluations

Lesson Coverage

  1. Why evaluations matter (quality assurance for prompts)
  2. Dataset generation strategies
  3. Running evaluations at scale on Bedrock
  4. Grading approaches (exact match, model-based, semantic)
  5. Exercises (build an eval harness)
  6. Assessment review

Key Notes

Evaluation Workflow:

1. Define task clearly with success criteria
2. Create labeled test dataset (golden set)
3. Run model against all examples
4. Apply grading function to each output
5. Compute aggregate metrics
6. Identify failure patterns
7. Iterate on prompt
8. Re-evaluate (never on same examples used to develop prompt)

Why Evals Matter on Bedrock:

  • Model updates happen (new model versions, access to newer Sonnet/Haiku)
  • Prompt regressions are common when changing system prompt
  • Cost/quality tradeoffs — Haiku vs. Sonnet — need data to decide
  • CI/CD integration: run evals on every prompt change before deployment

Test Dataset Design:

  • Minimum 20 examples before drawing conclusions; 50+ for reliable metrics
  • Cover: happy path, edge cases, adversarial inputs, language variants
  • Balance classes for classification tasks (equal positive/negative)
  • Never overlap training examples with evaluation examples
  • Label expected outputs precisely — ambiguous labels produce noisy metrics

Dataset Generation with Claude:

def generate_eval_dataset(task_description, num_examples=20):
    """Use Claude to generate diverse test cases."""
    prompt = f"""Generate {num_examples} diverse test cases for this task: {task_description}

For each test case, provide:
1. An input
2. The expected output
3. The difficulty level (easy/medium/hard)

Format as JSON array: [{{"input": "...", "expected": "...", "difficulty": "..."}}]
Return ONLY the JSON array."""

    body = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 4096,
        "temperature": 1.0,   # Diversity for generation
        "messages": [{"role": "user", "content": prompt}]
    }
    response = client.invoke_model(
        modelId=MODEL_ID,
        contentType='application/json',
        accept='application/json',
        body=json.dumps(body)
    )
    result = json.loads(response['body'].read())
    return json.loads(result['content'][0]['text'])

Grading Methods:

MethodWhen to UseCode Pattern
Exact matchControlled outputs (YES/NO, category names)output.strip() == expected
RegexFormat validation (dates, emails, IDs)bool(re.match(pattern, output))
JSON schemaStructured data extractionjsonschema.validate(parsed, schema)
SubstringKeyword presenceexpected.lower() in output.lower()
Semantic similarityParaphrase equivalencecosine_sim(embed(output), embed(expected))
Model-basedOpen-ended qualityAsk Claude to score 1-5

Model-Based Grader:

GRADER_MODEL = 'anthropic.claude-3-haiku-20240307-v1:0'  # Cheap for grading

def model_grade(output, expected, criterion, scale=5):
    """Use Claude to grade an output. Returns score 1-scale."""
    grading_prompt = f"""You are an impartial grader. Evaluate the output against the criterion.

Criterion: {criterion}
Expected: {expected}
Actual Output: {output}

Score the output from 1 to {scale}:
1 = Completely wrong
{scale} = Perfect

Return ONLY a single integer score."""

    body = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 10,
        "temperature": 0.0,
        "messages": [{"role": "user", "content": grading_prompt}]
    }
    response = client.invoke_model(
        modelId=GRADER_MODEL,
        contentType='application/json',
        accept='application/json',
        body=json.dumps(body)
    )
    result = json.loads(response['body'].read())
    try:
        return int(result['content'][0]['text'].strip())
    except ValueError:
        return 0   # Failed to parse score

Running Evaluations at Scale:

import concurrent.futures
import json
import time
from datetime import datetime

def run_evaluation(dataset, prompt_template, model_id, grader_fn, max_workers=5):
    """Run full evaluation suite with parallel grading."""

    def evaluate_single(example):
        # Format prompt with template
        prompt = prompt_template.format(input=example['input'])
        body = {
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": 512,
            "temperature": 0.0,
            "messages": [{"role": "user", "content": prompt}]
        }
        try:
            response = client.invoke_model(
                modelId=model_id,
                contentType='application/json',
                accept='application/json',
                body=json.dumps(body)
            )
            result = json.loads(response['body'].read())
            output = result['content'][0]['text'].strip()
            score = grader_fn(output, example['expected'])
            return {
                "input": example['input'],
                "expected": example['expected'],
                "output": output,
                "score": score,
                "tokens": result['usage']
            }
        except Exception as e:
            return {"input": example['input'], "error": str(e), "score": 0}

    with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
        results = list(executor.map(evaluate_single, dataset))

    # Compute metrics
    scores = [r['score'] for r in results if 'score' in r]
    total_tokens = sum(
        r.get('tokens', {}).get('input_tokens', 0) + r.get('tokens', {}).get('output_tokens', 0)
        for r in results
    )

    summary = {
        "timestamp": datetime.now().isoformat(),
        "model_id": model_id,
        "n_examples": len(dataset),
        "accuracy": sum(1 for s in scores if s >= 1) / len(scores),
        "avg_score": sum(scores) / len(scores) if scores else 0,
        "total_tokens": total_tokens,
        "failures": [r for r in results if 'error' in r]
    }

    return results, summary

Key Metrics to Track:

  • Accuracy: Percentage of examples meeting passing threshold
  • Precision / Recall / F1: For classification tasks
  • Average score: When using 1-5 scale grading
  • p95 latency: 95th percentile response time
  • Cost per correct answer: total_tokens × price_per_token / n_correct
  • Regression rate: % of examples that passed before but fail after prompt change

Best Practices

  • Never evaluate on examples used during prompt development (train/test split)
  • Use cheap models (Haiku) for grading to reduce evaluation cost
  • Log all evaluation runs: timestamp, model ID, prompt version, all scores
  • Compare prompt versions side-by-side on identical test sets
  • Run evals in CI/CD: gate deployments on minimum accuracy threshold
  • Use parallel execution (ThreadPoolExecutor) to speed up large evals
  • Validate your grader itself: manually check 10% of grader judgments

MODULE 4: Prompt Engineering

Lesson Coverage

  1. Direct instructions (clarity, specificity)
  2. Specificity and constraints (format, length, style)
  3. XML structure for complex prompts
  4. Few-shot examples
  5. Exercises
  6. Quiz and assessment

Key Notes

Core Principles:

Vague:   "Summarize this article."
Better:  "Summarize this article in 3 bullet points. Each point max 20 words."
Best:    "Summarize this article in exactly 3 bullet points. Use active voice.
          Each bullet must start with a strong verb. Max 20 words each."

Clarity Checklist:

  • Specify output format (bullet list, JSON, prose paragraph, numbered steps)
  • Specify output length (characters, words, sentences, tokens)
  • Specify constraints (tone, reading level, language, persona)
  • Use affirmative instructions (“always do X”) over negative (“don’t do Y”)
  • Put the most important instruction first or last (primacy and recency effects)

XML Tags for Complex Prompts:

<instructions>
Classify the customer review as POSITIVE, NEGATIVE, or NEUTRAL.
Rules:
1. If the review mentions both pros and cons equally, use NEUTRAL
2. Ignore overall star rating — focus only on the text sentiment
3. Respond with ONLY the classification word, nothing else
</instructions>

<examples>
<example>
<input>Amazing product! Shipped fast and works perfectly.</input>
<output>POSITIVE</output>
</example>
<example>
<input>The quality is decent but the price is way too high for what you get.</input>
<output>NEUTRAL</output>
</example>
<example>
<input>Terrible. Broke after 2 days. Don't buy this.</input>
<output>NEGATIVE</output>
</example>
</examples>

<review>
{customer_review}
</review>

Benefits of XML structure:

  • Clear separation of instructions, examples, and variable content
  • Easier to update individual sections without breaking others
  • Claude is trained to pay attention to XML tag boundaries
  • Prevents variable content from contaminating instructions

Few-Shot Examples:

# Few-shot improves consistency for pattern-following tasks
# Include 2-5 examples that exactly match desired output format

few_shot_prompt = """Convert natural language to SQL.

Input: Get all users who signed up after January 2024
SQL: SELECT * FROM users WHERE created_at > '2024-01-01';

Input: Find the top 10 products by revenue
SQL: SELECT product_id, SUM(revenue) as total FROM orders GROUP BY product_id ORDER BY total DESC LIMIT 10;

Input: {user_query}
SQL:"""

Chain of Thought:

# Standard CoT: ask model to think before answering
cot_prompt = """Solve this step by step, then give the final answer.

Problem: {problem}

Show your reasoning in <thinking> tags, then the answer in <answer> tags."""

# Zero-shot CoT trigger phrases:
# - "Let's think step by step."
# - "Work through this carefully."
# - "Think before answering."

# Structured CoT with stop sequence
body = {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 2048,
    "stop_sequences": ["</answer>"],   # Stop after getting the answer
    "messages": [{"role": "user", "content": cot_prompt.format(problem="...")}]
}

Role Prompting:

system_roles = {
    "code_reviewer": "You are a senior software engineer with 15 years of experience. Review code for bugs, security issues, and style. Be direct and specific.",
    "legal_analyst": "You are a paralegal assistant. Identify legal issues in contracts. Always note that this is not legal advice and recommend consulting a lawyer.",
    "data_scientist": "You are a senior data scientist. Analyze data, suggest statistical approaches, and write production-quality Python code using pandas, numpy, and scikit-learn.",
}

Output Format Techniques:

# Force JSON output
json_prompt = """Extract entity information from the text.
Return a JSON object with these keys: name, type, description.
Return ONLY the JSON object, no other text.

Text: {text}"""

# Force specific structure with prefill
body = {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 512,
    "messages": [
        {"role": "user", "content": json_prompt},
        {"role": "assistant", "content": "{"}  # Prefill forces JSON start
    ]
}

Temperature Guide:

Task TypeTemperatureWhy
Data extraction0.0Must be deterministic
Classification0.0Consistent categories
Code generation0.0–0.3Correct syntax matters
Summarization0.3–0.5Some variation OK
Q&A0.3–0.7Factual but not robotic
Creative writing0.7–1.0Diversity is valuable
Brainstorming0.9–1.0Maximum variety

Best Practices

  • Start simple, add complexity only when needed (KISS principle)
  • Test prompts on edge cases: empty input, Unicode, very long text, adversarial inputs
  • Separate dynamic content from static instructions using XML tags
  • Few-shot examples must exactly match the output format you want (including whitespace)
  • Use temperature=0 and store the prompt in version control — treat prompts as code
  • When using prefill (pre-filling assistant response), verify it doesn’t hurt quality
  • Stop sequences are powerful — use </answer> to extract just the needed section

MODULE 5: Tool Use

Lesson Coverage

  1. Introduction to tool use (function calling)
  2. Defining functions as tools
  3. JSON Schema for tool parameters
  4. Handling tool_use responses
  5. Multi-turn tool conversations (the loop)
  6. Multiple tools and parallel calls
  7. Batch tool operations
  8. Structured extraction via tools

Key Notes

Tool Use Message Flow:

┌─────────────────────────────────────────────────────────┐
│  1. Client sends: messages + tools list                  │
│  2. Claude responds: stop_reason="tool_use"              │
│     content: [text_block?, tool_use_block, ...]          │
│  3. Client executes the tool                             │
│  4. Client sends: original messages + assistant response │
│                   + user message with tool_result        │
│  5. Claude responds: stop_reason="end_turn"              │
│     content: [text_block with final answer]              │
└─────────────────────────────────────────────────────────┘

Tool Schema Definition (JSON Schema):

tools = [
    {
        "name": "get_weather",                    # Snake_case, clear verb+noun
        "description": """Get current weather for a city.
Returns temperature in specified unit and weather condition.
Use this when the user asks about current weather in any location.""",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {
                    "type": "string",
                    "description": "City name, e.g. 'New York' or 'London'"
                },
                "unit": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "Temperature unit. Default to celsius."
                }
            },
            "required": ["city"]   # unit is optional
        }
    },
    {
        "name": "search_database",
        "description": "Search the product database by keyword or SKU",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "Search term"},
                "limit": {
                    "type": "integer",
                    "description": "Max results to return",
                    "minimum": 1,
                    "maximum": 50,
                    "default": 10
                },
                "category": {
                    "type": "string",
                    "enum": ["electronics", "clothing", "books", "all"],
                    "default": "all"
                }
            },
            "required": ["query"]
        }
    }
]

Detecting and Handling tool_use Responses:

def parse_tool_calls(response):
    """Extract tool calls from Claude's response content."""
    tool_calls = []
    text_blocks = []

    for block in response['content']:
        if block['type'] == 'tool_use':
            tool_calls.append({
                'id': block['id'],          # Must be preserved for tool_result
                'name': block['name'],
                'input': block['input']     # Dict matching tool's input_schema
            })
        elif block['type'] == 'text':
            text_blocks.append(block['text'])

    return tool_calls, text_blocks

Building Tool Results Message:

def build_tool_results_message(tool_results_map):
    """Build the user message containing all tool results.
    tool_results_map: {tool_use_id: result_data}
    """
    content = []
    for tool_use_id, result in tool_results_map.items():
        content.append({
            "type": "tool_result",
            "tool_use_id": tool_use_id,         # Must match the tool_use block's id
            "content": json.dumps(result)        # String or list of content blocks
        })
    return {"role": "user", "content": content}

# Error in tool execution
def build_tool_error(tool_use_id, error_message):
    return {
        "type": "tool_result",
        "tool_use_id": tool_use_id,
        "content": f"Error: {error_message}",
        "is_error": True    # Signals to Claude that the tool failed
    }

Complete Tool Loop (Lesson 5):

import boto3, json

client = boto3.client('bedrock-runtime', region_name='us-east-1')
MODEL_ID = 'anthropic.claude-3-5-sonnet-20241022-v2:0'

# Define available tools
tools = [
    {
        "name": "get_stock_price",
        "description": "Get the current stock price for a ticker symbol",
        "input_schema": {
            "type": "object",
            "properties": {
                "ticker": {"type": "string", "description": "Stock ticker symbol, e.g. AAPL"}
            },
            "required": ["ticker"]
        }
    },
    {
        "name": "calculate",
        "description": "Evaluate a mathematical expression",
        "input_schema": {
            "type": "object",
            "properties": {
                "expression": {"type": "string", "description": "Math expression to evaluate"}
            },
            "required": ["expression"]
        }
    }
]

# Mock tool implementations
def get_stock_price(ticker):
    prices = {"AAPL": 182.50, "GOOGL": 141.80, "AMZN": 175.30}
    price = prices.get(ticker.upper())
    if price is None:
        return {"error": f"Unknown ticker: {ticker}"}
    return {"ticker": ticker, "price": price, "currency": "USD"}

def calculate(expression):
    try:
        # Use eval carefully — in production, use a safe math parser
        result = eval(expression, {"__builtins__": {}}, {"abs": abs, "round": round})
        return {"result": result}
    except Exception as e:
        return {"error": str(e)}

def dispatch_tool(name, tool_input):
    """Route tool call to implementation."""
    if name == "get_stock_price":
        return get_stock_price(**tool_input)
    elif name == "calculate":
        return calculate(**tool_input)
    else:
        return {"error": f"Unknown tool: {name}"}

def run_agent(user_message, max_iterations=10):
    """Run the tool use loop until end_turn."""
    messages = [{"role": "user", "content": user_message}]
    iteration = 0

    while iteration < max_iterations:
        iteration += 1

        body = {
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": 4096,
            "tools": tools,
            "messages": messages
        }

        response = json.loads(client.invoke_model(
            modelId=MODEL_ID,
            contentType='application/json',
            accept='application/json',
            body=json.dumps(body)
        )['body'].read())

        stop_reason = response['stop_reason']

        # Always add assistant response to history
        messages.append({"role": "assistant", "content": response['content']})

        if stop_reason == 'end_turn':
            # Extract final text
            for block in response['content']:
                if block['type'] == 'text':
                    return block['text']
            return ""

        elif stop_reason == 'tool_use':
            # Execute all tool calls
            tool_results = []
            for block in response['content']:
                if block['type'] == 'tool_use':
                    print(f"  Calling tool: {block['name']}({block['input']})")
                    result = dispatch_tool(block['name'], block['input'])
                    print(f"  Result: {result}")
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block['id'],
                        "content": json.dumps(result)
                    })

            # Add tool results as user message
            messages.append({"role": "user", "content": tool_results})

        else:
            print(f"Unexpected stop_reason: {stop_reason}")
            break

    return "Max iterations reached"

# Test
result = run_agent("What is AAPL's current stock price? And what's 15% of that price?")
print(result)

tool_choice Parameter:

# Auto (default): Claude decides when to call tools
body["tool_choice"] = {"type": "auto"}

# Any: Claude must call at least one tool
body["tool_choice"] = {"type": "any"}

# Force specific tool
body["tool_choice"] = {"type": "tool", "name": "get_weather"}

# Disable tools entirely (even if defined)
body["tool_choice"] = {"type": "none"}

Parallel Tool Calls (Multiple Tools, Lesson 6):

# Claude may return MULTIPLE tool_use blocks in one response
# This means Claude wants to call multiple tools in parallel
# You must execute ALL of them and return ALL results before continuing

def handle_parallel_tools(response_content, dispatch_fn):
    """Handle multiple parallel tool calls."""
    tool_results = []

    # Collect all tool_use blocks
    tool_calls = [b for b in response_content if b['type'] == 'tool_use']

    if not tool_calls:
        return None

    # Execute all in parallel
    import concurrent.futures
    with concurrent.futures.ThreadPoolExecutor() as executor:
        futures = {
            executor.submit(dispatch_fn, tc['name'], tc['input']): tc['id']
            for tc in tool_calls
        }
        for future, tool_use_id in futures.items():
            result = future.result()
            tool_results.append({
                "type": "tool_result",
                "tool_use_id": tool_use_id,
                "content": json.dumps(result)
            })

    return {"role": "user", "content": tool_results}

Structured Extraction via Tools (Lesson 8):

# Using tools for structured extraction is more reliable than JSON parsing
# The tool_use block guarantees valid structure matching the schema

extraction_tool = {
    "name": "extract_invoice",
    "description": "Extract invoice data from the provided text",
    "input_schema": {
        "type": "object",
        "properties": {
            "invoice_number": {"type": "string"},
            "vendor_name": {"type": "string"},
            "total_amount": {"type": "number"},
            "due_date": {"type": "string", "format": "date"},
            "line_items": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "description": {"type": "string"},
                        "quantity": {"type": "number"},
                        "unit_price": {"type": "number"}
                    }
                }
            }
        },
        "required": ["invoice_number", "vendor_name", "total_amount"]
    }
}

body = {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 1024,
    "tools": [extraction_tool],
    "tool_choice": {"type": "tool", "name": "extract_invoice"},  # Force it
    "messages": [{"role": "user", "content": f"Extract data from: {invoice_text}"}]
}

response = json.loads(client.invoke_model(
    modelId=MODEL_ID, contentType='application/json',
    accept='application/json', body=json.dumps(body)
)['body'].read())

# The extracted data is guaranteed to match the schema
extracted = response['content'][0]['input']

Bedrock Batch Inference (Lesson 7):

# For high-volume, non-time-sensitive workloads
# Input: S3 JSONL file (one request per line)
# Output: S3 JSONL file (one result per line, same order)

import boto3, json

bedrock = boto3.client('bedrock', region_name='us-east-1')

# Create batch job
job = bedrock.create_model_invocation_job(
    jobName='my-batch-job-001',
    modelId='anthropic.claude-3-haiku-20240307-v1:0',
    inputDataConfig={
        "s3InputDataConfig": {
            "s3Uri": "s3://my-bucket/batch-input/requests.jsonl",
            "s3InputFormat": "JSONL"
        }
    },
    outputDataConfig={
        "s3OutputDataConfig": {
            "s3Uri": "s3://my-bucket/batch-output/"
        }
    },
    roleArn="arn:aws:iam::123456789:role/BedrockBatchRole"
)

job_arn = job['jobArn']

# Poll for completion
import time
while True:
    status = bedrock.get_model_invocation_job(jobIdentifier=job_arn)
    state = status['status']
    if state == 'Completed':
        break
    elif state in ('Failed', 'Stopped'):
        raise RuntimeError(f"Batch job failed: {state}")
    print(f"Job status: {state}")
    time.sleep(30)

Best Practices

  • Write clear tool descriptions — Claude uses descriptions to decide when and how to call tools
  • Handle stop_reason="tool_use" in a loop until stop_reason="end_turn"
  • Validate tool inputs before executing (treat model output as untrusted)
  • Return structured errors with "is_error": true so Claude can adapt gracefully
  • Use parallel execution when Claude returns multiple tool_use blocks simultaneously
  • Use tool_choice={"type": "tool", "name": "..."} for guaranteed extraction use cases
  • Set a max iteration limit to prevent infinite tool loops

MODULE 6: RAG (Retrieval-Augmented Generation)

Lesson Coverage

  1. RAG introduction and architecture
  2. Chunking strategies
  3. Embeddings (Amazon Titan, Cohere on Bedrock)
  4. Full RAG workflow (index + query)
  5. BM25 keyword search
  6. Multi-search pipelines (hybrid)
  7. Reranking
  8. Contextual retrieval (Anthropic technique)

Key Notes

RAG Architecture:

INDEXING PIPELINE (offline):
Documents → Chunker → [chunk1, chunk2, ...] → Embedder → Vector Store

QUERY PIPELINE (online):
Query → Embed Query → Vector Search ─┐
Query ─────────────── BM25 Search  ──┤ → Merge/Rerank → Top-K → Claude → Answer
                                      └─ Hybrid RRF

Lesson 2: Chunking Strategies:

StrategyDescriptionWhen to Use
Fixed-sizeSplit every N chars/tokensSimple, consistent corpus
SentenceSplit at sentence boundariesBetter context preservation
ParagraphSplit at blank linesLong-form documents
SemanticSplit at section headingsStructured docs (manuals, wikis)
RecursiveTry paragraph, then sentence, then fixedGeneral purpose best practice
OverlappingSlide window with N-token overlapPrevent boundary cut-offs
def chunk_text_recursive(text, max_chunk_size=500, overlap=50):
    """Recursive character splitting with overlap."""
    separators = ["\n\n", "\n", ". ", " ", ""]
    chunks = []

    def split(text, separators):
        if len(text) <= max_chunk_size:
            return [text]

        for sep in separators:
            if sep in text:
                parts = text.split(sep)
                current = ""
                result = []
                for part in parts:
                    candidate = current + sep + part if current else part
                    if len(candidate) <= max_chunk_size:
                        current = candidate
                    else:
                        if current:
                            result.append(current)
                        current = part
                if current:
                    result.append(current)
                return result
        return [text[:max_chunk_size]]

    raw_chunks = split(text, separators)

    # Add overlap
    for i, chunk in enumerate(raw_chunks):
        if i > 0 and overlap > 0:
            prev_end = raw_chunks[i-1][-overlap:]
            chunk = prev_end + " " + chunk
        chunks.append(chunk.strip())

    return chunks

Lesson 3: Embeddings on Bedrock:

import boto3, json
import numpy as np

bedrock_client = boto3.client('bedrock-runtime', region_name='us-east-1')

def embed_text(text, model_id='amazon.titan-embed-text-v2:0'):
    """Generate embedding vector for text using Amazon Titan."""
    body = {
        "inputText": text,
        "dimensions": 1024,      # 256, 512, or 1024 (Titan V2)
        "normalize": True        # Normalize to unit vector for cosine similarity
    }
    response = bedrock_client.invoke_model(
        modelId=model_id,
        contentType='application/json',
        accept='application/json',
        body=json.dumps(body)
    )
    result = json.loads(response['body'].read())
    return np.array(result['embedding'])

def embed_batch(texts, model_id='amazon.titan-embed-text-v2:0'):
    """Embed multiple texts (sequential for Titan — no batch endpoint)."""
    return [embed_text(t, model_id) for t in texts]

# Cohere Embed on Bedrock (better multilingual)
def embed_cohere(text, input_type='search_document'):
    body = {
        "texts": [text],
        "input_type": input_type,   # 'search_document' for indexing, 'search_query' for queries
        "truncate": "END"
    }
    response = bedrock_client.invoke_model(
        modelId='cohere.embed-multilingual-v3',
        contentType='application/json',
        accept='application/json',
        body=json.dumps(body)
    )
    return json.loads(response['body'].read())['embeddings'][0]

Embedding Models Comparison:

Model IDDimsBest For
amazon.titan-embed-text-v2:0256/512/1024General English, adjustable size
amazon.titan-embed-text-v1:01536Legacy, fixed size
cohere.embed-multilingual-v31024Multilingual content
cohere.embed-english-v31024English, higher quality

Lesson 4: Full RAG Workflow (Simple In-Memory):

import numpy as np
from typing import List, Dict

class SimpleVectorStore:
    def __init__(self):
        self.chunks = []
        self.embeddings = []
        self.metadata = []

    def add(self, text: str, embedding: np.ndarray, metadata: Dict = None):
        self.chunks.append(text)
        self.embeddings.append(embedding)
        self.metadata.append(metadata or {})

    def search(self, query_embedding: np.ndarray, top_k: int = 5) -> List[Dict]:
        if not self.embeddings:
            return []

        # Cosine similarity
        matrix = np.array(self.embeddings)
        scores = matrix @ query_embedding  # Assumes normalized vectors
        top_indices = np.argsort(scores)[::-1][:top_k]

        return [
            {
                "text": self.chunks[i],
                "score": float(scores[i]),
                "metadata": self.metadata[i]
            }
            for i in top_indices
        ]

def build_rag_index(documents, chunk_size=500):
    """Index documents into vector store."""
    store = SimpleVectorStore()
    for doc in documents:
        chunks = chunk_text_recursive(doc['text'], chunk_size)
        for i, chunk in enumerate(chunks):
            embedding = embed_text(chunk)
            store.add(chunk, embedding, {
                "source": doc.get('title', 'unknown'),
                "chunk_index": i
            })
    return store

def rag_query(question, store, top_k=5):
    """Retrieve relevant chunks and generate answer."""
    # 1. Embed the question
    q_embedding = embed_text(question)

    # 2. Retrieve top-K chunks
    results = store.search(q_embedding, top_k=top_k)

    # 3. Build context
    context = "\n\n---\n\n".join([
        f"Source: {r['metadata'].get('source', 'unknown')}\n{r['text']}"
        for r in results
    ])

    # 4. Generate answer
    body = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 1024,
        "system": "Answer questions based only on the provided context. If the answer is not in the context, say so.",
        "messages": [{
            "role": "user",
            "content": f"Context:\n{context}\n\nQuestion: {question}"
        }]
    }

    response = bedrock_client.invoke_model(
        modelId='anthropic.claude-3-5-sonnet-20241022-v2:0',
        contentType='application/json',
        accept='application/json',
        body=json.dumps(body)
    )
    result = json.loads(response['body'].read())
    return result['content'][0]['text'], results

Lesson 5: BM25 Keyword Search:

# BM25 = Best Match 25 — probabilistic TF-IDF variant
# Great for exact keyword matches, product codes, names

# Install: pip install rank_bm25
from rank_bm25 import BM25Okapi
import re

def tokenize(text):
    return re.findall(r'\w+', text.lower())

class BM25Index:
    def __init__(self, documents):
        self.docs = documents
        tokenized = [tokenize(d['text']) for d in documents]
        self.bm25 = BM25Okapi(tokenized)

    def search(self, query, top_k=10):
        tokens = tokenize(query)
        scores = self.bm25.get_scores(tokens)
        top_indices = sorted(range(len(scores)), key=lambda i: scores[i], reverse=True)[:top_k]
        return [
            {"text": self.docs[i]['text'], "score": float(scores[i]), "metadata": self.docs[i].get('metadata', {})}
            for i in top_indices if scores[i] > 0
        ]

Lesson 6: Hybrid Search with Reciprocal Rank Fusion (RRF):

def reciprocal_rank_fusion(result_lists, k=60):
    """Merge multiple ranked result lists using RRF.
    k=60 is the standard constant — higher k = less weight to top ranks.
    """
    scores = {}  # doc_id -> RRF score

    for results in result_lists:
        for rank, result in enumerate(results):
            doc_id = result.get('id') or result['text'][:50]  # Use text prefix as id
            if doc_id not in scores:
                scores[doc_id] = {"score": 0.0, "data": result}
            scores[doc_id]["score"] += 1.0 / (k + rank + 1)

    return sorted(scores.values(), key=lambda x: x["score"], reverse=True)

def hybrid_search(query, vector_store, bm25_index, top_k=10, final_k=5):
    """Run vector + BM25 search and merge with RRF."""
    q_embedding = embed_text(query)

    vector_results = vector_store.search(q_embedding, top_k=top_k)
    bm25_results = bm25_index.search(query, top_k=top_k)

    # Add ID for merging
    for i, r in enumerate(vector_results):
        r['id'] = r['text'][:80]
    for i, r in enumerate(bm25_results):
        r['id'] = r['text'][:80]

    merged = reciprocal_rank_fusion([vector_results, bm25_results])
    return [m['data'] for m in merged[:final_k]]

Lesson 7: Reranking:

def rerank_with_bedrock(query, documents, top_k=5):
    """Rerank retrieved documents using Amazon Bedrock Reranker."""
    bedrock_agent = boto3.client('bedrock-agent-runtime', region_name='us-east-1')

    response = bedrock_agent.rerank(
        rerankingConfiguration={
            "type": "BEDROCK_RERANKING_MODEL",
            "bedrockRerankingConfiguration": {
                "modelConfiguration": {
                    "modelArn": "arn:aws:bedrock:us-east-1::foundation-model/amazon.rerank-v1:0"
                },
                "numberOfResults": top_k
            }
        },
        sources=[
            {
                "type": "INLINE",
                "inlineDocumentSource": {
                    "type": "TEXT",
                    "textDocument": {"text": doc['text']}
                }
            }
            for doc in documents
        ],
        textSources=[{"type": "QUERY", "query": {"text": query}}]
    )

    return [
        {
            "text": documents[item['index']]['text'],
            "rerank_score": item['relevanceScore'],
            "original_rank": item['index']
        }
        for item in sorted(response['rerankingResults'], key=lambda x: x['relevanceScore'], reverse=True)
    ]

Lesson 8: Contextual Retrieval (Anthropic Technique):

def add_chunk_context(chunk, full_document, model_id='anthropic.claude-3-haiku-20240307-v1:0'):
    """Prepend contextual summary to each chunk before embedding.
    Significantly improves retrieval accuracy by giving chunks document context.
    """
    prompt = f"""Here is a document:
<document>
{full_document[:3000]}  <!-- Truncate to avoid context overflow -->
</document>

Here is one chunk from this document:
<chunk>
{chunk}
</chunk>

Write a short 1-2 sentence context for this chunk that explains where it fits within the full document.
Return ONLY the context sentences, no preamble."""

    body = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 100,
        "temperature": 0.0,
        "messages": [{"role": "user", "content": prompt}]
    }
    response = bedrock_client.invoke_model(
        modelId=model_id,
        contentType='application/json',
        accept='application/json',
        body=json.dumps(body)
    )
    context = json.loads(response['body'].read())['content'][0]['text']
    return f"{context}\n\n{chunk}"   # Prepend context to chunk

# Bedrock Knowledge Bases (Managed RAG)
bedrock_agent_runtime = boto3.client('bedrock-agent-runtime', region_name='us-east-1')

def kb_retrieve_and_generate(query, knowledge_base_id, model_arn):
    """Use Bedrock Knowledge Bases for fully managed RAG."""
    response = bedrock_agent_runtime.retrieve_and_generate(
        input={"text": query},
        retrieveAndGenerateConfiguration={
            "type": "KNOWLEDGE_BASE",
            "knowledgeBaseConfiguration": {
                "knowledgeBaseId": knowledge_base_id,
                "modelArn": model_arn,
                "retrievalConfiguration": {
                    "vectorSearchConfiguration": {"numberOfResults": 10}
                }
            }
        }
    )
    return {
        "answer": response['output']['text'],
        "citations": response.get('citations', [])
    }

RAG Pipeline Decision Tree:

Do you want managed infrastructure?
  YES → Use Bedrock Knowledge Bases (S3 → auto-chunked → OpenSearch → managed query)
  NO  → Build custom pipeline:
           Do you have multilingual content? → Use Cohere Embed
           Do you have English-only content? → Use Amazon Titan Embed V2
           Do queries use exact keywords? → Add BM25 + hybrid search
           Do you have high retrieval quality requirements? → Add reranking
           Do you have large documents? → Use contextual retrieval

Best Practices

  • Retrieve more than needed (top-20 or top-50), then rerank to final top-5
  • Keep total context under 10K tokens to avoid attention degradation
  • Include source metadata (doc title, page, section) in retrieved chunks
  • Use hybrid search (vector + BM25) by default — pure vector misses exact terms
  • Apply contextual retrieval for large document corpora (10K+ chunks)
  • For production: Bedrock Knowledge Bases reduces ops overhead significantly
  • Test retrieval quality independently from generation quality

MODULE 7: Features of Claude

Lesson Coverage

  1. Extended thinking (multi-step reasoning)
  2. Image support (vision)
  3. PDF support
  4. Citations
  5. Prompt caching rules and mechanics
  6. Practical implementation

Key Notes

Lesson 1: Extended Thinking:

# Extended thinking allows Claude to reason before answering
# Adds "thinking" content blocks to the response
# Use for: math, logic puzzles, multi-step planning, code analysis

body = {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 16000,   # Must be > budget_tokens
    "thinking": {
        "type": "enabled",
        "budget_tokens": 10000   # Max tokens for reasoning (1024–32000)
    },
    "messages": [{"role": "user", "content": "Solve: A train leaves at 60mph..."}]
}

response = json.loads(client.invoke_model(
    modelId='anthropic.claude-3-5-sonnet-20241022-v2:0',
    contentType='application/json',
    accept='application/json',
    body=json.dumps(body)
)['body'].read())

for block in response['content']:
    if block['type'] == 'thinking':
        print("REASONING:", block['thinking'])   # Claude's internal reasoning
    elif block['type'] == 'text':
        print("ANSWER:", block['text'])           # Final answer

Extended Thinking Rules:

  • max_tokens must exceed budget_tokens (max_tokens = budget_tokens + expected_answer_tokens)
  • Temperature is fixed at 1 when thinking is enabled (cannot customize)
  • Cannot use top_p or top_k with thinking
  • Streaming works with thinking — you get thinking deltas in stream
  • Model decides how much of the budget to actually use

Lesson 2: Vision (Image Input):

import base64

def invoke_with_image(image_path, text_prompt, media_type='image/png'):
    with open(image_path, 'rb') as f:
        image_data = base64.b64encode(f.read()).decode('utf-8')

    body = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 1024,
        "messages": [{
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": media_type,   # image/jpeg, image/png, image/gif, image/webp
                        "data": image_data
                    }
                },
                {
                    "type": "text",
                    "text": text_prompt
                }
            ]
        }]
    }

    response = client.invoke_model(
        modelId='anthropic.claude-3-5-sonnet-20241022-v2:0',
        contentType='application/json',
        accept='application/json',
        body=json.dumps(body)
    )
    return json.loads(response['body'].read())['content'][0]['text']

# Multiple images in one request
def compare_images(image_path1, image_path2):
    def load_image(path):
        with open(path, 'rb') as f:
            return base64.b64encode(f.read()).decode('utf-8')

    body = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 1024,
        "messages": [{
            "role": "user",
            "content": [
                {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": load_image(image_path1)}},
                {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": load_image(image_path2)}},
                {"type": "text", "text": "Compare these two images. What are the main differences?"}
            ]
        }]
    }
    # ... invoke_model ...

Vision Limits on Bedrock:

  • Max 5MB per image
  • Max 20 images per request
  • Supported: JPEG, PNG, GIF, WebP
  • Each image consumes input tokens (cost varies by resolution)
  • Images are not stored — send fresh each request (no caching of images)

Lesson 3: PDF Support:

import base64

def invoke_with_pdf(pdf_path, question):
    with open(pdf_path, 'rb') as f:
        pdf_data = base64.b64encode(f.read()).decode('utf-8')

    body = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 2048,
        "messages": [{
            "role": "user",
            "content": [
                {
                    "type": "document",
                    "source": {
                        "type": "base64",
                        "media_type": "application/pdf",
                        "data": pdf_data
                    }
                },
                {
                    "type": "text",
                    "text": question
                }
            ]
        }]
    }

    response = client.invoke_model(
        modelId='anthropic.claude-3-5-sonnet-20241022-v2:0',
        contentType='application/json',
        accept='application/json',
        body=json.dumps(body)
    )
    return json.loads(response['body'].read())['content'][0]['text']

answer = invoke_with_pdf('quarterly_report.pdf', 'What was the total revenue in Q3?')

Lesson 4: Citations:

# Citations make Claude reference the documents it used in its answer
# Requires documents to be passed as document blocks (not plain text in content)

def rag_with_citations(question, documents):
    content = []

    # Add each document as a document block
    for doc in documents:
        content.append({
            "type": "document",
            "source": {
                "type": "text",
                "media_type": "text/plain",
                "data": doc['text']
            },
            "title": doc.get('title', 'Document'),
            "context": doc.get('source', ''),   # Optional source context
            "citations": {"enabled": True}
        })

    content.append({"type": "text", "text": question})

    body = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 2048,
        "messages": [{"role": "user", "content": content}]
    }

    response = client.invoke_model(
        modelId='anthropic.claude-3-5-sonnet-20241022-v2:0',
        contentType='application/json',
        accept='application/json',
        body=json.dumps(body)
    )

    result = json.loads(response['body'].read())

    # Parse citations from response
    for block in result['content']:
        if block['type'] == 'text':
            print("Answer:", block['text'])
        # Citation blocks reference document indices

Lesson 5 & 6: Prompt Caching:

# Prompt caching saves cost for repeated requests with identical prefix
# Cache TTL: 5 minutes (ephemeral) — must resend within 5 min to get cache hit
# Cache write: +25% cost on first write
# Cache read: -90% cost (only 10% of normal input price)

# CACHE THE SYSTEM PROMPT (most common use case)
body = {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 1024,
    "system": [
        {
            "type": "text",
            "text": "<long static instructions, documents, examples...>",
            "cache_control": {"type": "ephemeral"}   # Cache this prefix
        }
    ],
    "messages": [{"role": "user", "content": "Question 1"}]
}

# CACHE MID-CONVERSATION (large document in first user message)
body = {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 1024,
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "<very long document to analyze>",
                    "cache_control": {"type": "ephemeral"}   # Cache doc
                },
                {
                    "type": "text",
                    "text": "What is the main theme?"   # Dynamic question
                }
            ]
        }
    ]
}

# Check cache usage in response
result = json.loads(response['body'].read())
usage = result['usage']
# usage keys: input_tokens, output_tokens,
#             cache_creation_input_tokens (tokens written to cache)
#             cache_read_input_tokens (tokens read from cache)

Cache Placement Rules:

System prompt cache → put cache_control on last system text block
Message cache → put cache_control on the last static content block in messages
Multiple cache points → up to 4 simultaneous cache breakpoints
Cache key = everything BEFORE the cache_control marker (must be identical to hit)

When to Cache:

  • Large static system prompts (>1K tokens) reused across requests
  • Long documents analyzed multiple times (PDF, transcript, codebase)
  • Few-shot example banks shared across thousands of requests
  • Tool definitions that rarely change

Best Practices

  • Use extended thinking only for complex reasoning tasks — it adds latency and cost
  • Cache static content that is reused across multiple requests in the same 5-minute window
  • Resize images before sending — each image costs tokens based on resolution (smaller = cheaper)
  • Enable citations when building RAG apps where source attribution matters for trust
  • Test prompt caching by checking cache_read_input_tokens in response usage
  • PDFs count all pages toward context window — split very large PDFs for long documents

MODULE 8: Model Context Protocol (MCP)

Lesson Coverage

  1. MCP introduction and motivation
  2. MCP client setup
  3. Project configuration (claude_desktop_config.json)
  4. Tool definition in MCP servers
  5. Server inspection with MCP Inspector
  6. Client implementation (connecting to MCP server)
  7. MCP Resources (data exposure)
  8. MCP Prompts (templated interactions)
  9. Review and patterns

Key Notes

MCP Overview:

Problem: Every app needs to build its own tool integrations
Solution: MCP standardizes how AI apps connect to external tools/data

MCP Architecture:
┌──────────────┐    MCP Protocol    ┌──────────────┐    External APIs
│  MCP Client  │◄──────────────────►│  MCP Server  │◄──────────────►
│ (Claude app) │    stdio or HTTP   │  (your code) │    Databases
└──────────────┘                    └──────────────┘    File system

Three MCP Primitives:

PrimitivePurposeAnalogous To
ToolsFunctions Claude can callTool use / function calling
ResourcesData/files Claude can readContext injection
PromptsReusable prompt templatesPrompt library

Lesson 4: Building an MCP Server with FastMCP:

# pip install mcp fastmcp
from mcp.server.fastmcp import FastMCP
import json
import requests

mcp = FastMCP("bedrock-tools-server")

# --- TOOLS ---
@mcp.tool()
def get_weather(city: str, unit: str = "celsius") -> str:
    """Get current weather for a city.

    Args:
        city: City name (e.g. 'New York', 'London')
        unit: Temperature unit — 'celsius' or 'fahrenheit'
    """
    # In real code, call a weather API
    return json.dumps({
        "city": city,
        "temperature": 22,
        "unit": unit,
        "condition": "partly cloudy"
    })

@mcp.tool()
def search_docs(query: str, limit: int = 5) -> str:
    """Search internal documentation.

    Args:
        query: Search terms
        limit: Maximum results to return (1-20)
    """
    # Connect to your actual search backend
    results = [{"title": f"Result {i}", "snippet": "..."} for i in range(limit)]
    return json.dumps(results)

@mcp.tool()
def run_sql(query: str) -> str:
    """Execute a read-only SQL query on the analytics database.

    Args:
        query: SQL SELECT statement (no mutations allowed)
    """
    if not query.strip().upper().startswith("SELECT"):
        return json.dumps({"error": "Only SELECT queries are allowed"})
    # Execute query...
    return json.dumps({"columns": ["col1"], "rows": [[1, 2, 3]]})

# --- RESOURCES ---
@mcp.resource("config://app-settings")
def get_app_settings() -> str:
    """Current application configuration settings."""
    return json.dumps({
        "version": "2.1.0",
        "environment": "production",
        "features": {"rag": True, "caching": True}
    })

@mcp.resource("docs://readme")
def get_readme() -> str:
    """Project README file contents."""
    with open("README.md") as f:
        return f.read()

# --- PROMPTS ---
@mcp.prompt()
def analyze_code(language: str, code: str) -> str:
    """Generate a code analysis prompt."""
    return f"""Analyze this {language} code for:
1. Bugs and potential errors
2. Security vulnerabilities
3. Performance issues
4. Code style and readability

Code:
```{language}
{code}

Provide specific, actionable feedback."""

if name == “main”: mcp.run() # Starts stdio server by default


**Lesson 2 & 3: Client Setup and Configuration:**
```json
// claude_desktop_config.json — for Claude Desktop
{
  "mcpServers": {
    "bedrock-tools": {
      "command": "python",
      "args": ["/path/to/my-mcp-server.py"],
      "env": {
        "AWS_DEFAULT_REGION": "us-east-1"
      }
    },
    "filesystem": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/allowed/path"]
    }
  }
}

Transport Types:

TransportHowUse Case
stdioChild process stdin/stdoutLocal servers (most common)
HTTP/SSEServer-Sent Events over HTTPRemote servers, shared infrastructure
WebSocketBidirectional WSReal-time bidirectional

Lesson 5: MCP Inspector:

# Install and run MCP Inspector
npx @modelcontextprotocol/inspector python my-server.py

# Inspector shows:
# - All tools with their schemas
# - All resources with their URIs
# - All prompts with their templates
# - Interactive tool call tester
# - Server logs in real time

Lesson 6: Custom MCP Client:

import asyncio
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client

async def run_mcp_client():
    """Connect to an MCP server and call its tools."""
    server_params = StdioServerParameters(
        command="python",
        args=["my-mcp-server.py"]
    )

    async with stdio_client(server_params) as (read, write):
        async with ClientSession(read, write) as session:
            # Initialize connection
            await session.initialize()

            # List available tools
            tools_response = await session.list_tools()
            print("Available tools:")
            for tool in tools_response.tools:
                print(f"  - {tool.name}: {tool.description}")

            # List resources
            resources_response = await session.list_resources()
            for resource in resources_response.resources:
                print(f"  Resource: {resource.uri}")

            # Call a tool
            result = await session.call_tool("get_weather", {"city": "Seattle"})
            print("Weather result:", result.content[0].text)

            # Read a resource
            resource_content = await session.read_resource("config://app-settings")
            print("Config:", resource_content.contents[0].text)

asyncio.run(run_mcp_client())

Integrating MCP with Bedrock Claude:

# Pattern: Get tool schemas from MCP, pass to Bedrock, route calls back to MCP

async def bedrock_with_mcp(user_message, mcp_session):
    """Use MCP tools with Claude on Bedrock."""
    # Get tools from MCP server
    tools_list = await mcp_session.list_tools()

    # Convert MCP tool schemas to Bedrock tool format
    bedrock_tools = [
        {
            "name": tool.name,
            "description": tool.description,
            "input_schema": tool.inputSchema
        }
        for tool in tools_list.tools
    ]

    messages = [{"role": "user", "content": user_message}]

    while True:
        body = {
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": 4096,
            "tools": bedrock_tools,
            "messages": messages
        }

        response = json.loads(client.invoke_model(
            modelId='anthropic.claude-3-5-sonnet-20241022-v2:0',
            contentType='application/json',
            accept='application/json',
            body=json.dumps(body)
        )['body'].read())

        messages.append({"role": "assistant", "content": response['content']})

        if response['stop_reason'] == 'end_turn':
            for block in response['content']:
                if block['type'] == 'text':
                    return block['text']

        tool_results = []
        for block in response['content']:
            if block['type'] == 'tool_use':
                # Route call through MCP
                mcp_result = await mcp_session.call_tool(block['name'], block['input'])
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block['id'],
                    "content": mcp_result.content[0].text
                })

        messages.append({"role": "user", "content": tool_results})

Best Practices

  • Name tools and resources clearly — Claude uses names and descriptions to choose them
  • Validate all inputs in tool functions — Claude can send unexpected values
  • Return structured error JSON (don’t raise unhandled exceptions) so Claude can adapt
  • Use MCP for shared tool servers reused across multiple Claude applications
  • Test with MCP Inspector before connecting to any Claude application
  • Use stdio transport for local development, HTTP/SSE for production shared servers
  • Keep tool descriptions under 200 characters — concise beats verbose

MODULE 9: Agents

Lesson Coverage

  1. Agent overview and design patterns
  2. Claude Code setup and actions
  3. MCP enhancements for agents
  4. Parallelization patterns
  5. Automated debugging agents
  6. Computer Use
  7. Agent quality principles

Key Notes

What Makes an Agent:

An agent = LLM + tools + memory + orchestration loop
                                ↓
              (runs tool → gets result → decides next action → repeats)

Lesson 1: Agent Patterns:

1. PARALLELIZATION
   Orchestrator ─→ [Task A]  ─┐
                 ─→ [Task B]  ─┤→ Aggregate → Final Answer
                 ─→ [Task C]  ─┘

2. SEQUENTIAL CHAINING
   Input → [Agent 1: Draft] → [Agent 2: Review] → [Agent 3: Refine] → Output

3. ROUTING
   Input → [Router: Classify] → BILLING → [Billing Specialist]
                              → TECH    → [Tech Support Specialist]
                              → RETURNS → [Returns Specialist]

4. ORCHESTRATOR-SUBAGENT
   [Orchestrator Agent]
        ├→ [Research Subagent]
        ├→ [Writing Subagent]
        └→ [Fact-Check Subagent]

Lesson 4: Parallelization Pattern:

import concurrent.futures
import boto3, json

client = boto3.client('bedrock-runtime', region_name='us-east-1')
MODEL_ID = 'anthropic.claude-3-5-sonnet-20241022-v2:0'

def run_claude(system, user_message, max_tokens=2048):
    """Single Claude call."""
    body = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": max_tokens,
        "system": system,
        "messages": [{"role": "user", "content": user_message}]
    }
    response = client.invoke_model(
        modelId=MODEL_ID,
        contentType='application/json',
        accept='application/json',
        body=json.dumps(body)
    )
    return json.loads(response['body'].read())['content'][0]['text']

def parallel_research(topic, aspects):
    """Research multiple aspects of a topic in parallel."""
    def research_aspect(aspect):
        system = f"You are a research expert. Focus only on: {aspect}"
        return {
            "aspect": aspect,
            "findings": run_claude(system, f"Research: {topic} - specifically: {aspect}")
        }

    with concurrent.futures.ThreadPoolExecutor(max_workers=len(aspects)) as executor:
        futures = [executor.submit(research_aspect, a) for a in aspects]
        results = [f.result() for f in futures]

    # Synthesize in orchestrator
    synthesis_input = "\n\n".join([
        f"## {r['aspect']}\n{r['findings']}" for r in results
    ])

    return run_claude(
        "You synthesize research into coherent summaries.",
        f"Synthesize these research findings about {topic}:\n\n{synthesis_input}"
    )

# Usage
summary = parallel_research("quantum computing", [
    "current hardware state",
    "software ecosystems",
    "commercial applications",
    "timeline predictions"
])

Lesson 3: Routing Pattern:

ROUTE_SYSTEM_PROMPT = """Classify the customer request into exactly one category:
- BILLING: Payment, invoice, subscription, refund questions
- TECHNICAL: Bug reports, errors, how-to questions
- RETURNS: Return, exchange, or refund of physical products
- GENERAL: Everything else

Respond with ONLY the category name."""

SPECIALIST_SYSTEMS = {
    "BILLING": "You are a billing specialist. Help customers with payment and subscription issues.",
    "TECHNICAL": "You are a technical support engineer. Diagnose and resolve software issues.",
    "RETURNS": "You are a returns specialist. Help customers with returns and exchanges.",
    "GENERAL": "You are a helpful customer service agent."
}

def route_and_respond(customer_message):
    # Step 1: Route
    route = run_claude(ROUTE_SYSTEM_PROMPT, customer_message, max_tokens=20).strip()

    # Step 2: Respond with specialist
    specialist_system = SPECIALIST_SYSTEMS.get(route, SPECIALIST_SYSTEMS["GENERAL"])
    response = run_claude(specialist_system, customer_message)

    return {"route": route, "response": response}

Lesson 2: Claude Code Setup:

# Install Claude Code globally
npm install -g @anthropic-ai/claude-code

# Configure to use Bedrock (instead of direct Anthropic API)
export ANTHROPIC_API_KEY=""  # Not needed for Bedrock
export CLAUDE_CODE_USE_BEDROCK=1
export AWS_REGION=us-east-1

# Or configure in ~/.claude/settings.json for persistent Bedrock use
# Launch Claude Code
claude

Bedrock Agents (Managed Service):

import boto3

bedrock_agent = boto3.client('bedrock-agent-runtime', region_name='us-east-1')

def invoke_bedrock_agent(agent_id, agent_alias_id, session_id, user_message):
    """Invoke a fully managed Bedrock Agent."""
    response = bedrock_agent.invoke_agent(
        agentId=agent_id,
        agentAliasId=agent_alias_id,
        sessionId=session_id,          # Maintain session for multi-turn
        inputText=user_message,
        enableTrace=True               # Get orchestration trace for debugging
    )

    # Response is a streaming body
    full_response = ""
    traces = []

    for event in response['completion']:
        if 'chunk' in event:
            chunk = event['chunk']
            if 'bytes' in chunk:
                full_response += chunk['bytes'].decode('utf-8')
        elif 'trace' in event:
            traces.append(event['trace'])

    return {"response": full_response, "traces": traces}

Lesson 5: Automated Debugging Agent:

def debugging_agent(error_message, codebase_context, max_iterations=5):
    """Agent that iteratively diagnoses and proposes fixes."""
    tools = [
        {
            "name": "search_codebase",
            "description": "Search codebase for relevant code",
            "input_schema": {
                "type": "object",
                "properties": {
                    "query": {"type": "string"},
                    "file_pattern": {"type": "string", "description": "Optional glob pattern"}
                },
                "required": ["query"]
            }
        },
        {
            "name": "read_file",
            "description": "Read contents of a specific file",
            "input_schema": {
                "type": "object",
                "properties": {"path": {"type": "string"}},
                "required": ["path"]
            }
        },
        {
            "name": "run_test",
            "description": "Run specific test to verify fix",
            "input_schema": {
                "type": "object",
                "properties": {"test_name": {"type": "string"}},
                "required": ["test_name"]
            }
        }
    ]

    system = """You are a debugging expert. Analyze errors systematically:
1. Understand the error message
2. Search for relevant code
3. Form a hypothesis
4. Verify with tests
5. Propose a concrete fix"""

    messages = [{
        "role": "user",
        "content": f"Error: {error_message}\n\nContext: {codebase_context}"
    }]

    iteration = 0
    while iteration < max_iterations:
        iteration += 1
        body = {
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": 4096,
            "system": system,
            "tools": tools,
            "messages": messages
        }

        response = json.loads(client.invoke_model(
            modelId=MODEL_ID, contentType='application/json',
            accept='application/json', body=json.dumps(body)
        )['body'].read())

        messages.append({"role": "assistant", "content": response['content']})

        if response['stop_reason'] == 'end_turn':
            return response['content'][0]['text']

        tool_results = []
        for block in response['content']:
            if block['type'] == 'tool_use':
                result = dispatch_debug_tool(block['name'], block['input'])
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block['id'],
                    "content": json.dumps(result)
                })
        messages.append({"role": "user", "content": tool_results})

    return "Debugging agent reached max iterations"

Lesson 6: Computer Use:

# Computer Use tools for desktop automation
# Available on claude-3-5-sonnet via Bedrock
# ALWAYS use in sandboxed/VM environment — never on production machines

computer_use_tools = [
    {
        "type": "computer_20241022",
        "name": "computer",
        "display_width_px": 1920,
        "display_height_px": 1080,
        "display_number": 1
    },
    {
        "type": "text_editor_20241022",
        "name": "str_replace_editor"
    },
    {
        "type": "bash_20241022",
        "name": "bash"
    }
]

body = {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 4096,
    "tools": computer_use_tools,
    "messages": [{
        "role": "user",
        "content": "Take a screenshot and describe what's on the screen."
    }]
}

# Computer Use responses include tool_use blocks with actions like:
# {"type": "screenshot"}
# {"type": "mouse_move", "coordinate": [x, y]}
# {"type": "left_click", "coordinate": [x, y]}
# {"type": "type", "text": "Hello World"}
# {"type": "key", "text": "ctrl+c"}

Lesson 7: Agent Quality Principles:

Good Agent Design:
✓ Bounded loops — always set max_iterations
✓ Audit logging — log every tool call and result
✓ Graceful error handling — tools return errors, agent adapts
✓ Human-in-the-loop — pause for approval before destructive actions
✓ Idempotency — tools should be safe to retry
✓ Minimal permissions — only grant access the agent actually needs
✓ Monitoring — alert on loops, failures, and unusual tool call patterns

Bad Agent Design:
✗ No iteration limit — can run forever on bad state
✗ No logging — can't debug what happened
✗ Silent failures — agent assumes success and continues incorrectly
✗ Unrestricted access — agent can delete/modify anything
✗ Trusting model output blindly — validate inputs before execution

Best Practices

  • Keep agent loops bounded — always set max iterations to prevent runaway behavior
  • Log every tool call and result for debugging and audit trails
  • Use Bedrock Agents for production managed orchestration with minimal code
  • Build custom loops only when you need full control over orchestration logic
  • Restrict Computer Use to isolated, sandboxed VMs — never use on production systems
  • Design tools to be idempotent where possible — safe to retry
  • Implement human-in-the-loop checkpoints for high-stakes actions (deletions, payments)
  • Use parallelization for independent subtasks to reduce total wall-clock time
  • Route to specialist agents/prompts for domain-specific quality improvement

MODULE 10: Final Assessment

Key Areas to Review

Must Know for Exam:

  • Bedrock client: boto3.client('bedrock-runtime', region_name='us-east-1')
  • Required field: "anthropic_version": "bedrock-2023-05-31" in every request body
  • Invoke methods: invoke_model() (sync) and invoke_model_with_response_stream() (streaming)
  • Response parsing: response['body'].read()json.loads()['content'][0]['text']
  • Tool use message flow: tool_use block → execute → tool_result in user message
  • Tool result goes in a USER message, NOT an assistant message
  • Prompt caching: "cache_control": {"type": "ephemeral"} on content blocks
  • Cache TTL: 5 minutes
  • Extended thinking: "thinking": {"type": "enabled", "budget_tokens": N}
  • Cross-region IDs: prefix us. or eu. before anthropic.claude-...
  • IAM permission: bedrock:InvokeModel and bedrock:InvokeModelWithResponseStream

Common Exam Traps

TrapCorrect Answer
Forgetting anthropic_versionAlways include "bedrock-2023-05-31"
Using raw model name in modelIdMust include full ID with version suffix: anthropic.claude-3-5-sonnet-20241022-v2:0
Assuming cross-region is automaticMust explicitly use prefixed profile ID: us.anthropic.claude-...
Reading response body twiceStreamingBody can only be read once — call .read() once, store result
Putting tool_result in assistant messageTool results go in role: "user" messages
Thinking cache_control is permanentCache TTL = 5 minutes (ephemeral only)
Setting temperature with extended thinkingTemperature is fixed at 1 when thinking is enabled
Assuming max_tokens = actual output lengthIt’s a ceiling, not a target
Using anthropic SDK with BedrockUse boto3 — not the Anthropic Python library

Quick Reference: Request Template

import boto3, json

client = boto3.client('bedrock-runtime', region_name='us-east-1')

body = {
    "anthropic_version": "bedrock-2023-05-31",   # Required
    "max_tokens": 1024,                            # Required
    "system": "...",                               # Optional
    "messages": [
        {"role": "user", "content": "..."}
    ],
    # Optional parameters:
    "temperature": 0.7,
    "top_p": 0.9,
    "stop_sequences": ["###"],
    "tools": [...],
    "tool_choice": {"type": "auto"},
    "thinking": {"type": "enabled", "budget_tokens": 5000},
    "stream": True  # Only for invoke_model_with_response_stream
}

response = client.invoke_model(
    modelId='anthropic.claude-3-5-sonnet-20241022-v2:0',
    contentType='application/json',
    accept='application/json',
    body=json.dumps(body)
)

result = json.loads(response['body'].read())
text = result['content'][0]['text']

Certification Readiness Checklist

MODULE 1 — Course Introduction:

  • Know all Claude model IDs available on Bedrock and their use cases
  • Understand difference between standard and cross-region inference profile IDs
  • Know how to enable model access in Bedrock console
  • Know required IAM permissions for Bedrock

MODULE 2 — Working with the API:

  • Can create boto3 bedrock-runtime client with proper credentials
  • Know request body structure (anthropic_version, max_tokens, messages)
  • Can parse invoke_model response (body.read → json.loads → content[0].text)
  • Can implement multi-turn conversation history management
  • Can implement streaming with invoke_model_with_response_stream
  • Know all streaming event types (message_start, content_block_delta, message_stop)
  • Know temperature, top_p, top_k, stop_sequences parameters
  • Can implement retry logic with exponential backoff

MODULE 3 — Prompt Evaluations:

  • Can design evaluation dataset (size, coverage, no leakage)
  • Know multiple grading approaches (exact match, regex, JSON schema, model-based)
  • Can implement model-based grader using cheap model (Haiku)
  • Can run parallel evaluations with ThreadPoolExecutor
  • Know key metrics: accuracy, F1, latency, cost per correct answer

MODULE 4 — Prompt Engineering:

  • Know XML tag structure for complex prompts
  • Can implement few-shot prompting with proper format matching
  • Know chain of thought patterns (thinking tags, step-by-step)
  • Know temperature guidance by task type
  • Can use stop sequences for output extraction

MODULE 5 — Tool Use:

  • Can define tools with JSON Schema (properties, types, required, enum)
  • Know tool use message flow (request → tool_use → tool_result → response)
  • Can implement complete tool loop (handle stop_reason=tool_use)
  • Know all tool_choice options (auto, any, tool, none)
  • Can handle parallel tool calls (multiple tool_use blocks)
  • Know Bedrock batch inference (create_model_invocation_job, S3 I/O)

MODULE 6 — RAG:

  • Know chunking strategies (fixed, recursive, overlapping) and tradeoffs
  • Can invoke Amazon Titan Embed V2 for embeddings
  • Can implement cosine similarity search
  • Can implement BM25 keyword search with rank_bm25
  • Can implement hybrid search with RRF
  • Know reranking with amazon.rerank-v1:0
  • Know Bedrock Knowledge Bases API (retrieve_and_generate)
  • Know contextual retrieval technique

MODULE 7 — Features of Claude:

  • Can enable extended thinking with budget_tokens
  • Know thinking block constraints (no temperature control, max_tokens requirement)
  • Can send images as base64 content blocks
  • Can send PDFs as base64 document blocks
  • Can enable citations in document-based requests
  • Can implement prompt caching with cache_control
  • Know cache TTL (5 min), write cost (+25%), read savings (-90%)

MODULE 8 — MCP:

  • Know three MCP primitives (Tools, Resources, Prompts)
  • Can build FastMCP server with @mcp.tool(), @mcp.resource(), @mcp.prompt()
  • Know stdio vs HTTP transport options
  • Can configure claude_desktop_config.json for MCP servers
  • Know how to use MCP Inspector for debugging
  • Can implement MCP client with ClientSession

MODULE 9 — Agents:

  • Know four agent patterns (parallelization, chaining, routing, orchestrator-subagent)
  • Can implement parallel subagents with ThreadPoolExecutor
  • Can implement routing pattern (classify → specialist)
  • Can invoke Bedrock Agents with bedrock-agent-runtime
  • Know Computer Use tools and safety requirements
  • Know agent quality principles (bounded loops, logging, human-in-the-loop)

MODULE 11: Course Wrap-Up

Summary

Core Difference: Bedrock vs Direct API

AspectDirect Anthropic APIAmazon Bedrock
AuthANTHROPIC_API_KEYIAM roles/credentials
SDKanthropic Python libraryboto3
Clientanthropic.Anthropic()boto3.client('bedrock-runtime')
Model IDsclaude-3-5-sonnet-20241022anthropic.claude-3-5-sonnet-20241022-v2:0
Request methodclient.messages.create()client.invoke_model()
Body formatNamed keyword argsJSON bytes in body param
Required extra fieldNone"anthropic_version": "bedrock-2023-05-31"
Streaming.stream() context managerinvoke_model_with_response_stream()
BillingAnthropic invoiceAWS invoice
Managed RAGNone built-inBedrock Knowledge Bases
Managed AgentsNone built-inBedrock Agents + Action Groups
Batch inferenceBatch API (files API)Model Invocation Jobs (S3 I/O)
Region controlGlobal endpointPer-region (must enable model access)
Cross-regionN/AInference profiles (us., eu. prefix)
GuardrailsNone built-inBedrock Guardrails
ComplianceAnthropic ToSAWS compliance (HIPAA-eligible, SOC2)

Best Practices for Production on AWS:

  • Use IAM roles, not static credentials — least privilege, no rotation needed in code
  • Enable CloudWatch logging for Bedrock invocations (audit, cost tracking, debugging)
  • Use Bedrock Guardrails to filter harmful content at managed layer
  • Use cross-region inference profiles for higher availability
  • Cache system prompts and static documents to reduce cost on repetitive workloads
  • Use Bedrock Knowledge Bases for managed RAG (less ops); custom pipelines for full control
  • Monitor token usage with AWS Cost Explorer and set billing alerts
  • Pin model IDs (full ID including version) to prevent unexpected upgrades
  • Use cross-region inference profiles in production for resilience
  • Implement exponential backoff for ThrottlingException

Quick Comparison: Direct API vs Bedrock vs Vertex AI

FeatureAnthropic DirectAmazon BedrockGoogle Vertex AI
AuthAPI KeyAWS IAMGoogle ADC / Service Account
SDKanthropicboto3anthropic (AnthropicVertex)
Client initAnthropic()boto3.client('bedrock-runtime')AnthropicVertex(region, project)
Model ID formatclaude-3-5-sonnet-20241022anthropic.claude-3-5-sonnet-20241022-v2:0claude-3-5-sonnet@20241022
Extra requiredNoneanthropic_version in bodyNone
Streaming.stream() CMinvoke_model_with_response_stream().stream() CM
BillingAnthropicAWSGoogle Cloud
Managed RAGNoneBedrock Knowledge BasesVertex AI Search
Managed AgentsNoneBedrock AgentsVertex AI Agent Builder
Embeddings nativeNoneAmazon Titan EmbeddingsVertex AI Embeddings
Batch inferenceBatch API (files)Model Invocation Jobs (S3)Batch prediction jobs
GuardrailsNone built-inBedrock GuardrailsVertex AI safety filters
Prompt cachingYesYesYes
Extended thinkingYesYesYes
Computer UseYesYesYes

Study tip: The most Bedrock-specific concepts that differ from other platforms are: 1. boto3 client setup with bedrock-runtime 2. JSON body encoding (not keyword args) with required anthropic_version field 3. Tool result placement in USER messages 4. Prompt cache TTL (5 minutes ephemeral) 5. Cross-region inference profile ID prefixes (us., eu.) 6. Model ID format including version suffix and anthropic. prefix