Claude with Amazon Bedrock - Certification Study Guide
Table of Contents
- Claude with Amazon Bedrock - Certification Study Guide
- MODULE 1: Course Introduction
- MODULE 2: Working with the API
- MODULE 3: Prompt Evaluations
- MODULE 4: Prompt Engineering
- MODULE 5: Tool Use
- MODULE 6: RAG (Retrieval-Augmented Generation)
- MODULE 7: Features of Claude
- MODULE 8: Model Context Protocol (MCP)
- MODULE 9: Agents
- MODULE 10: Final Assessment
- MODULE 11: Course Wrap-Up
- Quick Comparison: Direct API vs Bedrock vs Vertex AI
Claude with Amazon Bedrock - Certification Study Guide
Course: Anthropic - Claude with Amazon Bedrock Modules: 11 (Introduction through Course Wrap-Up) Lessons: 40+ individual lessons Target: Certification Prep
MODULE 1: Course Introduction
Lesson Coverage
- Introduction to the course structure and goals
- Claude Models available on Amazon Bedrock
Key Notes
Course Scope:
- Using Claude via AWS Bedrock — same models, different auth/SDK layer
- All exercises use
boto3, NOT theanthropicPython SDK - AWS handles billing, compliance (SOC2, HIPAA-eligible), and data residency
Claude Models Available on Bedrock:
| Model ID | Version | Use Case |
|---|---|---|
anthropic.claude-3-5-sonnet-20241022-v2:0 | Sonnet 3.5 v2 | Best balance of speed and intelligence |
anthropic.claude-3-5-haiku-20241022-v1:0 | Haiku 3.5 | Fast and cheap, high volume tasks |
anthropic.claude-3-haiku-20240307-v1:0 | Haiku 3 | Older Haiku, still available |
anthropic.claude-3-opus-20240229-v1:0 | Opus 3 | Most capable, slower |
Cross-Region Inference Profile IDs (prefix us. or eu.):
us.anthropic.claude-3-5-sonnet-20241022-v2:0us.anthropic.claude-3-5-haiku-20241022-v1:0eu.anthropic.claude-3-5-sonnet-20241022-v2:0
Bedrock vs Direct API — At a Glance:
Direct Anthropic API Amazon Bedrock
───────────────────── ──────────────────────
Auth: API Key Auth: IAM roles/credentials
SDK: anthropic SDK: boto3
URL: api.anthropic.com URL: bedrock-runtime.<region>.amazonaws.com
Bill: Anthropic invoice Bill: AWS invoice
Prerequisites for the Course:
- Python 3.9+
- AWS account with Bedrock enabled
- IAM user or role with Bedrock permissions
- Model access requested in Bedrock console (not automatic — must be manually enabled)
Enabling Model Access:
- Open AWS Console → Amazon Bedrock → Model access
- Click “Manage model access”
- Check Claude models → Submit request
- Wait for “Access granted” status (usually instant for Claude)
Required IAM Permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream",
"bedrock:CreateModelInvocationJob",
"bedrock:GetModelInvocationJob",
"bedrock:ListFoundationModels"
],
"Resource": "*"
}
]
}
Best Practices
- Enable model access in the Bedrock console before running any code (access is not automatic)
- Use IAM roles (not static credentials) in production — attach
AmazonBedrockFullAccessor scoped policy - Choose region where Claude models are available:
us-east-1,us-west-2,eu-west-3,ap-southeast-1 - Pin model IDs including version suffix to prevent unexpected upgrades
- Never hardcode AWS credentials — use environment variables or IAM instance profiles
MODULE 2: Working with the API
Lesson Coverage
- API access setup (boto3, credentials, regions)
- Creating your first request (request body structure)
- Multi-turn conversations (stateless history management)
- Chatbot exercise (building a REPL)
- System prompts (configuring model behavior)
- Temperature settings (controlling randomness)
- Streaming responses (real-time output)
- Output control (stop sequences, max_tokens, top_p, top_k)
- Structured data extraction (JSON output)
Key Notes
boto3 Setup and Authentication:
import boto3
import json
# Option 1: Explicit credentials (dev only — never in production code)
client = boto3.client(
'bedrock-runtime',
region_name='us-east-1',
aws_access_key_id='AKIA...',
aws_secret_access_key='...'
)
# Option 2: Environment variables (recommended for dev)
# AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_DEFAULT_REGION
client = boto3.client('bedrock-runtime', region_name='us-east-1')
# Option 3: IAM role (recommended for production / EC2 / Lambda)
# boto3 automatically picks up role credentials
client = boto3.client('bedrock-runtime', region_name='us-east-1')
# Option 4: Named profile from ~/.aws/credentials
session = boto3.Session(profile_name='my-bedrock-profile')
client = session.client('bedrock-runtime', region_name='us-east-1')
Core Bedrock-Runtime Methods:
| Method | Behavior | Use Case |
|---|---|---|
invoke_model() | Synchronous — waits for full response | Standard requests |
invoke_model_with_response_stream() | Streaming — returns chunks progressively | Long responses, UX |
create_model_invocation_job() | Async batch — S3 input/output | Large-scale batch |
get_model_invocation_job() | Check batch job status | Polling batch jobs |
Request Body Format (CRITICAL):
The request body is NOT keyword arguments.
It's a JSON string encoded as bytes, passed as the 'body' parameter.
The modelId is a SEPARATE parameter from the body.
Required structure:
body = {
"anthropic_version": "bedrock-2023-05-31", # REQUIRED — do not omit
"max_tokens": 1024, # REQUIRED
"messages": [ # REQUIRED
{"role": "user", "content": "Hello"}
],
# Optional fields:
"system": "You are a helpful assistant.",
"temperature": 0.7,
"top_p": 0.9,
"top_k": 250,
"stop_sequences": ["###"]
}
response = client.invoke_model(
modelId='anthropic.claude-3-5-sonnet-20241022-v2:0',
contentType='application/json',
accept='application/json',
body=json.dumps(body) # Must be bytes or string
)
Response Structure:
response = client.invoke_model(...)
# response is a dict with HTTP metadata + 'body' StreamingBody
raw = response['body'].read() # Read all bytes
result = json.loads(raw) # Parse JSON
# result structure:
# {
# "id": "msg_01...",
# "type": "message",
# "role": "assistant",
# "content": [{"type": "text", "text": "..."}],
# "model": "claude-3-5-sonnet-20241022",
# "stop_reason": "end_turn",
# "stop_sequence": null,
# "usage": {"input_tokens": 25, "output_tokens": 142}
# }
text = result['content'][0]['text']
input_tokens = result['usage']['input_tokens']
output_tokens = result['usage']['output_tokens']
Lesson 2: Basic Request Example
import boto3
import json
client = boto3.client('bedrock-runtime', region_name='us-east-1')
MODEL_ID = 'anthropic.claude-3-5-sonnet-20241022-v2:0'
def invoke_claude(prompt, system=None, max_tokens=1024, temperature=1.0):
body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": max_tokens,
"messages": [{"role": "user", "content": prompt}]
}
if system:
body["system"] = system
if temperature != 1.0:
body["temperature"] = temperature
response = client.invoke_model(
modelId=MODEL_ID,
contentType='application/json',
accept='application/json',
body=json.dumps(body)
)
result = json.loads(response['body'].read())
return result['content'][0]['text'], result['usage']
text, usage = invoke_claude("What is the capital of France?")
print(text)
print(f"Tokens used: {usage['input_tokens']} in, {usage['output_tokens']} out")
Lesson 3 & 4: Multi-Turn Conversations and Chatbot
import boto3, json
client = boto3.client('bedrock-runtime', region_name='us-east-1')
MODEL_ID = 'anthropic.claude-3-5-sonnet-20241022-v2:0'
class BedrockChatbot:
def __init__(self, system_prompt=None):
self.system = system_prompt
self.history = [] # List of {"role": ..., "content": ...}
self.total_tokens = {"input": 0, "output": 0}
def chat(self, user_message, max_tokens=2048):
self.history.append({"role": "user", "content": user_message})
body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": max_tokens,
"messages": self.history
}
if self.system:
body["system"] = self.system
response = client.invoke_model(
modelId=MODEL_ID,
contentType='application/json',
accept='application/json',
body=json.dumps(body)
)
result = json.loads(response['body'].read())
assistant_text = result['content'][0]['text']
self.history.append({"role": "assistant", "content": assistant_text})
self.total_tokens["input"] += result['usage']['input_tokens']
self.total_tokens["output"] += result['usage']['output_tokens']
return assistant_text
def reset(self):
self.history = []
# REPL chatbot
bot = BedrockChatbot(system_prompt="You are an expert Python tutor.")
print("Chatbot ready. Type 'quit' to exit, 'reset' to clear history.")
while True:
user_input = input("You: ").strip()
if user_input.lower() == 'quit':
break
if user_input.lower() == 'reset':
bot.reset()
print("History cleared.")
continue
response = bot.chat(user_input)
print(f"Claude: {response}\n")
print(f"Total tokens: {bot.total_tokens}")
Lesson 5: System Prompts
# System prompts configure the model's persona, constraints, and behavior
# They are separate from the messages array — top-level "system" key
body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 1024,
"system": """You are a senior AWS solutions architect.
Always recommend AWS-native services where possible.
Format responses with clear headings and bullet points.
If asked about pricing, always say to check the AWS Pricing Calculator.""",
"messages": [
{"role": "user", "content": "How should I deploy a Python API?"}
]
}
Lesson 6: Temperature Settings
# temperature: 0.0 to 1.0 (some models accept up to 2.0 via direct API, Bedrock caps at 1.0)
# 0.0 = deterministic, always same output (good for structured tasks)
# 1.0 = maximum diversity (good for creative tasks)
# Deterministic extraction
body_extract = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 512,
"temperature": 0.0, # Fully deterministic
"messages": [{"role": "user", "content": "Extract the date from: Meeting on Jan 15 2024"}]
}
# Creative writing
body_creative = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 1024,
"temperature": 1.0, # Full creativity
"messages": [{"role": "user", "content": "Write a haiku about cloud computing."}]
}
# top_p: nucleus sampling — consider top P% of probability mass (alternative to temperature)
# top_k: only consider top K tokens at each step
# Do not set both temperature and top_p — choose one
Lesson 7: Streaming Responses
import boto3, json
client = boto3.client('bedrock-runtime', region_name='us-east-1')
MODEL_ID = 'anthropic.claude-3-5-sonnet-20241022-v2:0'
def stream_response(prompt, system=None, max_tokens=2048):
body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": max_tokens,
"messages": [{"role": "user", "content": prompt}]
}
if system:
body["system"] = system
response = client.invoke_model_with_response_stream(
modelId=MODEL_ID,
contentType='application/json',
accept='application/json',
body=json.dumps(body)
)
full_text = ""
input_tokens = 0
output_tokens = 0
for event in response['body']:
chunk_bytes = event['chunk']['bytes']
chunk = json.loads(chunk_bytes.decode('utf-8'))
event_type = chunk.get('type', '')
if event_type == 'content_block_delta':
# Text delta — print as it arrives
delta_text = chunk['delta'].get('text', '')
print(delta_text, end='', flush=True)
full_text += delta_text
elif event_type == 'message_delta':
# Final stats
output_tokens = chunk.get('usage', {}).get('output_tokens', 0)
elif event_type == 'message_start':
# Initial metadata including input token count
input_tokens = chunk.get('message', {}).get('usage', {}).get('input_tokens', 0)
elif event_type == 'message_stop':
# Stream complete
pass
print() # newline after stream
return full_text, {"input_tokens": input_tokens, "output_tokens": output_tokens}
# Usage
text, usage = stream_response("Explain quantum entanglement in simple terms.")
print(f"\nUsage: {usage}")
Streaming Event Types Reference:
message_start — first event, contains model + initial usage
content_block_start — begins a content block (text or tool_use)
content_block_delta — text or JSON delta within a block
content_block_stop — ends a content block
message_delta — stop_reason, stop_sequence, final output usage
message_stop — stream complete (last event)
Lesson 8: Output Control
# max_tokens: hard cap on output length (required field)
# stop_sequences: list of strings that end generation when encountered
# top_p: cumulative probability for nucleus sampling
# top_k: limit token selection to top K candidates
body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 2048,
"stop_sequences": ["</answer>", "Human:", "###"],
"top_p": 0.95,
"messages": [
{"role": "user", "content": "Answer this: What is 2+2? Wrap in <answer> tags."}
]
}
# The model stops when it would generate a stop sequence
# result['stop_reason'] will be "stop_sequence" if that triggered the stop
# result['stop_sequence'] gives which sequence triggered it
Lesson 9: Structured Data Extraction
import json
def extract_structured_data(text, schema_description):
"""Extract structured data from unstructured text using Claude."""
prompt = f"""Extract the following information from the text and return as JSON.
Schema: {schema_description}
Text:
{text}
Return ONLY valid JSON, no explanation."""
body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 1024,
"temperature": 0.0, # Deterministic for extraction
"messages": [{"role": "user", "content": prompt}]
}
response = client.invoke_model(
modelId=MODEL_ID,
contentType='application/json',
accept='application/json',
body=json.dumps(body)
)
result = json.loads(response['body'].read())
raw_text = result['content'][0]['text'].strip()
# Strip markdown code fences if present
if raw_text.startswith('```'):
raw_text = raw_text.split('\n', 1)[1].rsplit('```', 1)[0].strip()
return json.loads(raw_text)
# Example usage
invoice_text = """
Invoice #1234 from Acme Corp, dated March 15 2024.
Total amount: $2,450.00. Payment due April 15 2024.
Contact: billing@acme.com
"""
schema = '{"invoice_number": "string", "vendor": "string", "amount": "number", "due_date": "string"}'
data = extract_structured_data(invoice_text, schema)
print(data)
# {'invoice_number': '1234', 'vendor': 'Acme Corp', 'amount': 2450.00, 'due_date': '2024-04-15'}
Cross-Region Inference:
# Standard model ID — fixed to one region
MODEL_ID = 'anthropic.claude-3-5-sonnet-20241022-v2:0'
# Cross-region inference profile — routes across AZs within geography
# US profiles:
CROSS_REGION_US = 'us.anthropic.claude-3-5-sonnet-20241022-v2:0'
# EU profiles:
CROSS_REGION_EU = 'eu.anthropic.claude-3-5-sonnet-20241022-v2:0'
# When using cross-region, the client region still determines the entry point
client = boto3.client('bedrock-runtime', region_name='us-east-1')
# Then use CROSS_REGION_US as modelId — Bedrock routes intelligently
Error Handling:
import boto3
import json
from botocore.exceptions import ClientError
import time
def invoke_with_retry(client, model_id, body, max_retries=3):
for attempt in range(max_retries):
try:
response = client.invoke_model(
modelId=model_id,
contentType='application/json',
accept='application/json',
body=json.dumps(body)
)
return json.loads(response['body'].read())
except ClientError as e:
error_code = e.response['Error']['Code']
if error_code == 'ThrottlingException':
wait = (2 ** attempt) + 1 # Exponential backoff
print(f"Throttled. Waiting {wait}s before retry {attempt+1}/{max_retries}")
time.sleep(wait)
continue
elif error_code == 'ValidationException':
raise ValueError(f"Invalid request: {e.response['Error']['Message']}")
elif error_code == 'AccessDeniedException':
raise PermissionError("Check IAM permissions and model access in Bedrock console")
elif error_code == 'ModelNotReadyException':
raise RuntimeError("Model not available — check Bedrock console model access status")
else:
raise
raise RuntimeError(f"Max retries ({max_retries}) exceeded")
Best Practices
- Always include
"anthropic_version": "bedrock-2023-05-31"in request body (required by Bedrock) - Always call
response['body'].read()before parsing —StreamingBodycan only be read once - Handle
ThrottlingExceptionwith exponential backoff — Bedrock has per-account TPS limits - Stream for responses >500 tokens to improve perceived latency
- Use cross-region inference profiles in production for higher availability
- Temperature 0.0 for extraction/classification, 0.7-1.0 for generation
- Set
max_tokensconservatively for cost control — it’s a cap, not a target
MODULE 3: Prompt Evaluations
Lesson Coverage
- Why evaluations matter (quality assurance for prompts)
- Dataset generation strategies
- Running evaluations at scale on Bedrock
- Grading approaches (exact match, model-based, semantic)
- Exercises (build an eval harness)
- Assessment review
Key Notes
Evaluation Workflow:
1. Define task clearly with success criteria
2. Create labeled test dataset (golden set)
3. Run model against all examples
4. Apply grading function to each output
5. Compute aggregate metrics
6. Identify failure patterns
7. Iterate on prompt
8. Re-evaluate (never on same examples used to develop prompt)
Why Evals Matter on Bedrock:
- Model updates happen (new model versions, access to newer Sonnet/Haiku)
- Prompt regressions are common when changing system prompt
- Cost/quality tradeoffs — Haiku vs. Sonnet — need data to decide
- CI/CD integration: run evals on every prompt change before deployment
Test Dataset Design:
- Minimum 20 examples before drawing conclusions; 50+ for reliable metrics
- Cover: happy path, edge cases, adversarial inputs, language variants
- Balance classes for classification tasks (equal positive/negative)
- Never overlap training examples with evaluation examples
- Label expected outputs precisely — ambiguous labels produce noisy metrics
Dataset Generation with Claude:
def generate_eval_dataset(task_description, num_examples=20):
"""Use Claude to generate diverse test cases."""
prompt = f"""Generate {num_examples} diverse test cases for this task: {task_description}
For each test case, provide:
1. An input
2. The expected output
3. The difficulty level (easy/medium/hard)
Format as JSON array: [{{"input": "...", "expected": "...", "difficulty": "..."}}]
Return ONLY the JSON array."""
body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 4096,
"temperature": 1.0, # Diversity for generation
"messages": [{"role": "user", "content": prompt}]
}
response = client.invoke_model(
modelId=MODEL_ID,
contentType='application/json',
accept='application/json',
body=json.dumps(body)
)
result = json.loads(response['body'].read())
return json.loads(result['content'][0]['text'])
Grading Methods:
| Method | When to Use | Code Pattern |
|---|---|---|
| Exact match | Controlled outputs (YES/NO, category names) | output.strip() == expected |
| Regex | Format validation (dates, emails, IDs) | bool(re.match(pattern, output)) |
| JSON schema | Structured data extraction | jsonschema.validate(parsed, schema) |
| Substring | Keyword presence | expected.lower() in output.lower() |
| Semantic similarity | Paraphrase equivalence | cosine_sim(embed(output), embed(expected)) |
| Model-based | Open-ended quality | Ask Claude to score 1-5 |
Model-Based Grader:
GRADER_MODEL = 'anthropic.claude-3-haiku-20240307-v1:0' # Cheap for grading
def model_grade(output, expected, criterion, scale=5):
"""Use Claude to grade an output. Returns score 1-scale."""
grading_prompt = f"""You are an impartial grader. Evaluate the output against the criterion.
Criterion: {criterion}
Expected: {expected}
Actual Output: {output}
Score the output from 1 to {scale}:
1 = Completely wrong
{scale} = Perfect
Return ONLY a single integer score."""
body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 10,
"temperature": 0.0,
"messages": [{"role": "user", "content": grading_prompt}]
}
response = client.invoke_model(
modelId=GRADER_MODEL,
contentType='application/json',
accept='application/json',
body=json.dumps(body)
)
result = json.loads(response['body'].read())
try:
return int(result['content'][0]['text'].strip())
except ValueError:
return 0 # Failed to parse score
Running Evaluations at Scale:
import concurrent.futures
import json
import time
from datetime import datetime
def run_evaluation(dataset, prompt_template, model_id, grader_fn, max_workers=5):
"""Run full evaluation suite with parallel grading."""
def evaluate_single(example):
# Format prompt with template
prompt = prompt_template.format(input=example['input'])
body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 512,
"temperature": 0.0,
"messages": [{"role": "user", "content": prompt}]
}
try:
response = client.invoke_model(
modelId=model_id,
contentType='application/json',
accept='application/json',
body=json.dumps(body)
)
result = json.loads(response['body'].read())
output = result['content'][0]['text'].strip()
score = grader_fn(output, example['expected'])
return {
"input": example['input'],
"expected": example['expected'],
"output": output,
"score": score,
"tokens": result['usage']
}
except Exception as e:
return {"input": example['input'], "error": str(e), "score": 0}
with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
results = list(executor.map(evaluate_single, dataset))
# Compute metrics
scores = [r['score'] for r in results if 'score' in r]
total_tokens = sum(
r.get('tokens', {}).get('input_tokens', 0) + r.get('tokens', {}).get('output_tokens', 0)
for r in results
)
summary = {
"timestamp": datetime.now().isoformat(),
"model_id": model_id,
"n_examples": len(dataset),
"accuracy": sum(1 for s in scores if s >= 1) / len(scores),
"avg_score": sum(scores) / len(scores) if scores else 0,
"total_tokens": total_tokens,
"failures": [r for r in results if 'error' in r]
}
return results, summary
Key Metrics to Track:
- Accuracy: Percentage of examples meeting passing threshold
- Precision / Recall / F1: For classification tasks
- Average score: When using 1-5 scale grading
- p95 latency: 95th percentile response time
- Cost per correct answer: total_tokens × price_per_token / n_correct
- Regression rate: % of examples that passed before but fail after prompt change
Best Practices
- Never evaluate on examples used during prompt development (train/test split)
- Use cheap models (Haiku) for grading to reduce evaluation cost
- Log all evaluation runs: timestamp, model ID, prompt version, all scores
- Compare prompt versions side-by-side on identical test sets
- Run evals in CI/CD: gate deployments on minimum accuracy threshold
- Use parallel execution (
ThreadPoolExecutor) to speed up large evals - Validate your grader itself: manually check 10% of grader judgments
MODULE 4: Prompt Engineering
Lesson Coverage
- Direct instructions (clarity, specificity)
- Specificity and constraints (format, length, style)
- XML structure for complex prompts
- Few-shot examples
- Exercises
- Quiz and assessment
Key Notes
Core Principles:
Vague: "Summarize this article."
Better: "Summarize this article in 3 bullet points. Each point max 20 words."
Best: "Summarize this article in exactly 3 bullet points. Use active voice.
Each bullet must start with a strong verb. Max 20 words each."
Clarity Checklist:
- Specify output format (bullet list, JSON, prose paragraph, numbered steps)
- Specify output length (characters, words, sentences, tokens)
- Specify constraints (tone, reading level, language, persona)
- Use affirmative instructions (“always do X”) over negative (“don’t do Y”)
- Put the most important instruction first or last (primacy and recency effects)
XML Tags for Complex Prompts:
<instructions>
Classify the customer review as POSITIVE, NEGATIVE, or NEUTRAL.
Rules:
1. If the review mentions both pros and cons equally, use NEUTRAL
2. Ignore overall star rating — focus only on the text sentiment
3. Respond with ONLY the classification word, nothing else
</instructions>
<examples>
<example>
<input>Amazing product! Shipped fast and works perfectly.</input>
<output>POSITIVE</output>
</example>
<example>
<input>The quality is decent but the price is way too high for what you get.</input>
<output>NEUTRAL</output>
</example>
<example>
<input>Terrible. Broke after 2 days. Don't buy this.</input>
<output>NEGATIVE</output>
</example>
</examples>
<review>
{customer_review}
</review>
Benefits of XML structure:
- Clear separation of instructions, examples, and variable content
- Easier to update individual sections without breaking others
- Claude is trained to pay attention to XML tag boundaries
- Prevents variable content from contaminating instructions
Few-Shot Examples:
# Few-shot improves consistency for pattern-following tasks
# Include 2-5 examples that exactly match desired output format
few_shot_prompt = """Convert natural language to SQL.
Input: Get all users who signed up after January 2024
SQL: SELECT * FROM users WHERE created_at > '2024-01-01';
Input: Find the top 10 products by revenue
SQL: SELECT product_id, SUM(revenue) as total FROM orders GROUP BY product_id ORDER BY total DESC LIMIT 10;
Input: {user_query}
SQL:"""
Chain of Thought:
# Standard CoT: ask model to think before answering
cot_prompt = """Solve this step by step, then give the final answer.
Problem: {problem}
Show your reasoning in <thinking> tags, then the answer in <answer> tags."""
# Zero-shot CoT trigger phrases:
# - "Let's think step by step."
# - "Work through this carefully."
# - "Think before answering."
# Structured CoT with stop sequence
body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 2048,
"stop_sequences": ["</answer>"], # Stop after getting the answer
"messages": [{"role": "user", "content": cot_prompt.format(problem="...")}]
}
Role Prompting:
system_roles = {
"code_reviewer": "You are a senior software engineer with 15 years of experience. Review code for bugs, security issues, and style. Be direct and specific.",
"legal_analyst": "You are a paralegal assistant. Identify legal issues in contracts. Always note that this is not legal advice and recommend consulting a lawyer.",
"data_scientist": "You are a senior data scientist. Analyze data, suggest statistical approaches, and write production-quality Python code using pandas, numpy, and scikit-learn.",
}
Output Format Techniques:
# Force JSON output
json_prompt = """Extract entity information from the text.
Return a JSON object with these keys: name, type, description.
Return ONLY the JSON object, no other text.
Text: {text}"""
# Force specific structure with prefill
body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 512,
"messages": [
{"role": "user", "content": json_prompt},
{"role": "assistant", "content": "{"} # Prefill forces JSON start
]
}
Temperature Guide:
| Task Type | Temperature | Why |
|---|---|---|
| Data extraction | 0.0 | Must be deterministic |
| Classification | 0.0 | Consistent categories |
| Code generation | 0.0–0.3 | Correct syntax matters |
| Summarization | 0.3–0.5 | Some variation OK |
| Q&A | 0.3–0.7 | Factual but not robotic |
| Creative writing | 0.7–1.0 | Diversity is valuable |
| Brainstorming | 0.9–1.0 | Maximum variety |
Best Practices
- Start simple, add complexity only when needed (KISS principle)
- Test prompts on edge cases: empty input, Unicode, very long text, adversarial inputs
- Separate dynamic content from static instructions using XML tags
- Few-shot examples must exactly match the output format you want (including whitespace)
- Use
temperature=0and store the prompt in version control — treat prompts as code - When using prefill (pre-filling assistant response), verify it doesn’t hurt quality
- Stop sequences are powerful — use
</answer>to extract just the needed section
MODULE 5: Tool Use
Lesson Coverage
- Introduction to tool use (function calling)
- Defining functions as tools
- JSON Schema for tool parameters
- Handling tool_use responses
- Multi-turn tool conversations (the loop)
- Multiple tools and parallel calls
- Batch tool operations
- Structured extraction via tools
Key Notes
Tool Use Message Flow:
┌─────────────────────────────────────────────────────────┐
│ 1. Client sends: messages + tools list │
│ 2. Claude responds: stop_reason="tool_use" │
│ content: [text_block?, tool_use_block, ...] │
│ 3. Client executes the tool │
│ 4. Client sends: original messages + assistant response │
│ + user message with tool_result │
│ 5. Claude responds: stop_reason="end_turn" │
│ content: [text_block with final answer] │
└─────────────────────────────────────────────────────────┘
Tool Schema Definition (JSON Schema):
tools = [
{
"name": "get_weather", # Snake_case, clear verb+noun
"description": """Get current weather for a city.
Returns temperature in specified unit and weather condition.
Use this when the user asks about current weather in any location.""",
"input_schema": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "City name, e.g. 'New York' or 'London'"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit. Default to celsius."
}
},
"required": ["city"] # unit is optional
}
},
{
"name": "search_database",
"description": "Search the product database by keyword or SKU",
"input_schema": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search term"},
"limit": {
"type": "integer",
"description": "Max results to return",
"minimum": 1,
"maximum": 50,
"default": 10
},
"category": {
"type": "string",
"enum": ["electronics", "clothing", "books", "all"],
"default": "all"
}
},
"required": ["query"]
}
}
]
Detecting and Handling tool_use Responses:
def parse_tool_calls(response):
"""Extract tool calls from Claude's response content."""
tool_calls = []
text_blocks = []
for block in response['content']:
if block['type'] == 'tool_use':
tool_calls.append({
'id': block['id'], # Must be preserved for tool_result
'name': block['name'],
'input': block['input'] # Dict matching tool's input_schema
})
elif block['type'] == 'text':
text_blocks.append(block['text'])
return tool_calls, text_blocks
Building Tool Results Message:
def build_tool_results_message(tool_results_map):
"""Build the user message containing all tool results.
tool_results_map: {tool_use_id: result_data}
"""
content = []
for tool_use_id, result in tool_results_map.items():
content.append({
"type": "tool_result",
"tool_use_id": tool_use_id, # Must match the tool_use block's id
"content": json.dumps(result) # String or list of content blocks
})
return {"role": "user", "content": content}
# Error in tool execution
def build_tool_error(tool_use_id, error_message):
return {
"type": "tool_result",
"tool_use_id": tool_use_id,
"content": f"Error: {error_message}",
"is_error": True # Signals to Claude that the tool failed
}
Complete Tool Loop (Lesson 5):
import boto3, json
client = boto3.client('bedrock-runtime', region_name='us-east-1')
MODEL_ID = 'anthropic.claude-3-5-sonnet-20241022-v2:0'
# Define available tools
tools = [
{
"name": "get_stock_price",
"description": "Get the current stock price for a ticker symbol",
"input_schema": {
"type": "object",
"properties": {
"ticker": {"type": "string", "description": "Stock ticker symbol, e.g. AAPL"}
},
"required": ["ticker"]
}
},
{
"name": "calculate",
"description": "Evaluate a mathematical expression",
"input_schema": {
"type": "object",
"properties": {
"expression": {"type": "string", "description": "Math expression to evaluate"}
},
"required": ["expression"]
}
}
]
# Mock tool implementations
def get_stock_price(ticker):
prices = {"AAPL": 182.50, "GOOGL": 141.80, "AMZN": 175.30}
price = prices.get(ticker.upper())
if price is None:
return {"error": f"Unknown ticker: {ticker}"}
return {"ticker": ticker, "price": price, "currency": "USD"}
def calculate(expression):
try:
# Use eval carefully — in production, use a safe math parser
result = eval(expression, {"__builtins__": {}}, {"abs": abs, "round": round})
return {"result": result}
except Exception as e:
return {"error": str(e)}
def dispatch_tool(name, tool_input):
"""Route tool call to implementation."""
if name == "get_stock_price":
return get_stock_price(**tool_input)
elif name == "calculate":
return calculate(**tool_input)
else:
return {"error": f"Unknown tool: {name}"}
def run_agent(user_message, max_iterations=10):
"""Run the tool use loop until end_turn."""
messages = [{"role": "user", "content": user_message}]
iteration = 0
while iteration < max_iterations:
iteration += 1
body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 4096,
"tools": tools,
"messages": messages
}
response = json.loads(client.invoke_model(
modelId=MODEL_ID,
contentType='application/json',
accept='application/json',
body=json.dumps(body)
)['body'].read())
stop_reason = response['stop_reason']
# Always add assistant response to history
messages.append({"role": "assistant", "content": response['content']})
if stop_reason == 'end_turn':
# Extract final text
for block in response['content']:
if block['type'] == 'text':
return block['text']
return ""
elif stop_reason == 'tool_use':
# Execute all tool calls
tool_results = []
for block in response['content']:
if block['type'] == 'tool_use':
print(f" Calling tool: {block['name']}({block['input']})")
result = dispatch_tool(block['name'], block['input'])
print(f" Result: {result}")
tool_results.append({
"type": "tool_result",
"tool_use_id": block['id'],
"content": json.dumps(result)
})
# Add tool results as user message
messages.append({"role": "user", "content": tool_results})
else:
print(f"Unexpected stop_reason: {stop_reason}")
break
return "Max iterations reached"
# Test
result = run_agent("What is AAPL's current stock price? And what's 15% of that price?")
print(result)
tool_choice Parameter:
# Auto (default): Claude decides when to call tools
body["tool_choice"] = {"type": "auto"}
# Any: Claude must call at least one tool
body["tool_choice"] = {"type": "any"}
# Force specific tool
body["tool_choice"] = {"type": "tool", "name": "get_weather"}
# Disable tools entirely (even if defined)
body["tool_choice"] = {"type": "none"}
Parallel Tool Calls (Multiple Tools, Lesson 6):
# Claude may return MULTIPLE tool_use blocks in one response
# This means Claude wants to call multiple tools in parallel
# You must execute ALL of them and return ALL results before continuing
def handle_parallel_tools(response_content, dispatch_fn):
"""Handle multiple parallel tool calls."""
tool_results = []
# Collect all tool_use blocks
tool_calls = [b for b in response_content if b['type'] == 'tool_use']
if not tool_calls:
return None
# Execute all in parallel
import concurrent.futures
with concurrent.futures.ThreadPoolExecutor() as executor:
futures = {
executor.submit(dispatch_fn, tc['name'], tc['input']): tc['id']
for tc in tool_calls
}
for future, tool_use_id in futures.items():
result = future.result()
tool_results.append({
"type": "tool_result",
"tool_use_id": tool_use_id,
"content": json.dumps(result)
})
return {"role": "user", "content": tool_results}
Structured Extraction via Tools (Lesson 8):
# Using tools for structured extraction is more reliable than JSON parsing
# The tool_use block guarantees valid structure matching the schema
extraction_tool = {
"name": "extract_invoice",
"description": "Extract invoice data from the provided text",
"input_schema": {
"type": "object",
"properties": {
"invoice_number": {"type": "string"},
"vendor_name": {"type": "string"},
"total_amount": {"type": "number"},
"due_date": {"type": "string", "format": "date"},
"line_items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"description": {"type": "string"},
"quantity": {"type": "number"},
"unit_price": {"type": "number"}
}
}
}
},
"required": ["invoice_number", "vendor_name", "total_amount"]
}
}
body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 1024,
"tools": [extraction_tool],
"tool_choice": {"type": "tool", "name": "extract_invoice"}, # Force it
"messages": [{"role": "user", "content": f"Extract data from: {invoice_text}"}]
}
response = json.loads(client.invoke_model(
modelId=MODEL_ID, contentType='application/json',
accept='application/json', body=json.dumps(body)
)['body'].read())
# The extracted data is guaranteed to match the schema
extracted = response['content'][0]['input']
Bedrock Batch Inference (Lesson 7):
# For high-volume, non-time-sensitive workloads
# Input: S3 JSONL file (one request per line)
# Output: S3 JSONL file (one result per line, same order)
import boto3, json
bedrock = boto3.client('bedrock', region_name='us-east-1')
# Create batch job
job = bedrock.create_model_invocation_job(
jobName='my-batch-job-001',
modelId='anthropic.claude-3-haiku-20240307-v1:0',
inputDataConfig={
"s3InputDataConfig": {
"s3Uri": "s3://my-bucket/batch-input/requests.jsonl",
"s3InputFormat": "JSONL"
}
},
outputDataConfig={
"s3OutputDataConfig": {
"s3Uri": "s3://my-bucket/batch-output/"
}
},
roleArn="arn:aws:iam::123456789:role/BedrockBatchRole"
)
job_arn = job['jobArn']
# Poll for completion
import time
while True:
status = bedrock.get_model_invocation_job(jobIdentifier=job_arn)
state = status['status']
if state == 'Completed':
break
elif state in ('Failed', 'Stopped'):
raise RuntimeError(f"Batch job failed: {state}")
print(f"Job status: {state}")
time.sleep(30)
Best Practices
- Write clear tool descriptions — Claude uses descriptions to decide when and how to call tools
- Handle
stop_reason="tool_use"in a loop untilstop_reason="end_turn" - Validate tool inputs before executing (treat model output as untrusted)
- Return structured errors with
"is_error": trueso Claude can adapt gracefully - Use parallel execution when Claude returns multiple tool_use blocks simultaneously
- Use
tool_choice={"type": "tool", "name": "..."}for guaranteed extraction use cases - Set a max iteration limit to prevent infinite tool loops
MODULE 6: RAG (Retrieval-Augmented Generation)
Lesson Coverage
- RAG introduction and architecture
- Chunking strategies
- Embeddings (Amazon Titan, Cohere on Bedrock)
- Full RAG workflow (index + query)
- BM25 keyword search
- Multi-search pipelines (hybrid)
- Reranking
- Contextual retrieval (Anthropic technique)
Key Notes
RAG Architecture:
INDEXING PIPELINE (offline):
Documents → Chunker → [chunk1, chunk2, ...] → Embedder → Vector Store
QUERY PIPELINE (online):
Query → Embed Query → Vector Search ─┐
Query ─────────────── BM25 Search ──┤ → Merge/Rerank → Top-K → Claude → Answer
└─ Hybrid RRF
Lesson 2: Chunking Strategies:
| Strategy | Description | When to Use |
|---|---|---|
| Fixed-size | Split every N chars/tokens | Simple, consistent corpus |
| Sentence | Split at sentence boundaries | Better context preservation |
| Paragraph | Split at blank lines | Long-form documents |
| Semantic | Split at section headings | Structured docs (manuals, wikis) |
| Recursive | Try paragraph, then sentence, then fixed | General purpose best practice |
| Overlapping | Slide window with N-token overlap | Prevent boundary cut-offs |
def chunk_text_recursive(text, max_chunk_size=500, overlap=50):
"""Recursive character splitting with overlap."""
separators = ["\n\n", "\n", ". ", " ", ""]
chunks = []
def split(text, separators):
if len(text) <= max_chunk_size:
return [text]
for sep in separators:
if sep in text:
parts = text.split(sep)
current = ""
result = []
for part in parts:
candidate = current + sep + part if current else part
if len(candidate) <= max_chunk_size:
current = candidate
else:
if current:
result.append(current)
current = part
if current:
result.append(current)
return result
return [text[:max_chunk_size]]
raw_chunks = split(text, separators)
# Add overlap
for i, chunk in enumerate(raw_chunks):
if i > 0 and overlap > 0:
prev_end = raw_chunks[i-1][-overlap:]
chunk = prev_end + " " + chunk
chunks.append(chunk.strip())
return chunks
Lesson 3: Embeddings on Bedrock:
import boto3, json
import numpy as np
bedrock_client = boto3.client('bedrock-runtime', region_name='us-east-1')
def embed_text(text, model_id='amazon.titan-embed-text-v2:0'):
"""Generate embedding vector for text using Amazon Titan."""
body = {
"inputText": text,
"dimensions": 1024, # 256, 512, or 1024 (Titan V2)
"normalize": True # Normalize to unit vector for cosine similarity
}
response = bedrock_client.invoke_model(
modelId=model_id,
contentType='application/json',
accept='application/json',
body=json.dumps(body)
)
result = json.loads(response['body'].read())
return np.array(result['embedding'])
def embed_batch(texts, model_id='amazon.titan-embed-text-v2:0'):
"""Embed multiple texts (sequential for Titan — no batch endpoint)."""
return [embed_text(t, model_id) for t in texts]
# Cohere Embed on Bedrock (better multilingual)
def embed_cohere(text, input_type='search_document'):
body = {
"texts": [text],
"input_type": input_type, # 'search_document' for indexing, 'search_query' for queries
"truncate": "END"
}
response = bedrock_client.invoke_model(
modelId='cohere.embed-multilingual-v3',
contentType='application/json',
accept='application/json',
body=json.dumps(body)
)
return json.loads(response['body'].read())['embeddings'][0]
Embedding Models Comparison:
| Model ID | Dims | Best For |
|---|---|---|
amazon.titan-embed-text-v2:0 | 256/512/1024 | General English, adjustable size |
amazon.titan-embed-text-v1:0 | 1536 | Legacy, fixed size |
cohere.embed-multilingual-v3 | 1024 | Multilingual content |
cohere.embed-english-v3 | 1024 | English, higher quality |
Lesson 4: Full RAG Workflow (Simple In-Memory):
import numpy as np
from typing import List, Dict
class SimpleVectorStore:
def __init__(self):
self.chunks = []
self.embeddings = []
self.metadata = []
def add(self, text: str, embedding: np.ndarray, metadata: Dict = None):
self.chunks.append(text)
self.embeddings.append(embedding)
self.metadata.append(metadata or {})
def search(self, query_embedding: np.ndarray, top_k: int = 5) -> List[Dict]:
if not self.embeddings:
return []
# Cosine similarity
matrix = np.array(self.embeddings)
scores = matrix @ query_embedding # Assumes normalized vectors
top_indices = np.argsort(scores)[::-1][:top_k]
return [
{
"text": self.chunks[i],
"score": float(scores[i]),
"metadata": self.metadata[i]
}
for i in top_indices
]
def build_rag_index(documents, chunk_size=500):
"""Index documents into vector store."""
store = SimpleVectorStore()
for doc in documents:
chunks = chunk_text_recursive(doc['text'], chunk_size)
for i, chunk in enumerate(chunks):
embedding = embed_text(chunk)
store.add(chunk, embedding, {
"source": doc.get('title', 'unknown'),
"chunk_index": i
})
return store
def rag_query(question, store, top_k=5):
"""Retrieve relevant chunks and generate answer."""
# 1. Embed the question
q_embedding = embed_text(question)
# 2. Retrieve top-K chunks
results = store.search(q_embedding, top_k=top_k)
# 3. Build context
context = "\n\n---\n\n".join([
f"Source: {r['metadata'].get('source', 'unknown')}\n{r['text']}"
for r in results
])
# 4. Generate answer
body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 1024,
"system": "Answer questions based only on the provided context. If the answer is not in the context, say so.",
"messages": [{
"role": "user",
"content": f"Context:\n{context}\n\nQuestion: {question}"
}]
}
response = bedrock_client.invoke_model(
modelId='anthropic.claude-3-5-sonnet-20241022-v2:0',
contentType='application/json',
accept='application/json',
body=json.dumps(body)
)
result = json.loads(response['body'].read())
return result['content'][0]['text'], results
Lesson 5: BM25 Keyword Search:
# BM25 = Best Match 25 — probabilistic TF-IDF variant
# Great for exact keyword matches, product codes, names
# Install: pip install rank_bm25
from rank_bm25 import BM25Okapi
import re
def tokenize(text):
return re.findall(r'\w+', text.lower())
class BM25Index:
def __init__(self, documents):
self.docs = documents
tokenized = [tokenize(d['text']) for d in documents]
self.bm25 = BM25Okapi(tokenized)
def search(self, query, top_k=10):
tokens = tokenize(query)
scores = self.bm25.get_scores(tokens)
top_indices = sorted(range(len(scores)), key=lambda i: scores[i], reverse=True)[:top_k]
return [
{"text": self.docs[i]['text'], "score": float(scores[i]), "metadata": self.docs[i].get('metadata', {})}
for i in top_indices if scores[i] > 0
]
Lesson 6: Hybrid Search with Reciprocal Rank Fusion (RRF):
def reciprocal_rank_fusion(result_lists, k=60):
"""Merge multiple ranked result lists using RRF.
k=60 is the standard constant — higher k = less weight to top ranks.
"""
scores = {} # doc_id -> RRF score
for results in result_lists:
for rank, result in enumerate(results):
doc_id = result.get('id') or result['text'][:50] # Use text prefix as id
if doc_id not in scores:
scores[doc_id] = {"score": 0.0, "data": result}
scores[doc_id]["score"] += 1.0 / (k + rank + 1)
return sorted(scores.values(), key=lambda x: x["score"], reverse=True)
def hybrid_search(query, vector_store, bm25_index, top_k=10, final_k=5):
"""Run vector + BM25 search and merge with RRF."""
q_embedding = embed_text(query)
vector_results = vector_store.search(q_embedding, top_k=top_k)
bm25_results = bm25_index.search(query, top_k=top_k)
# Add ID for merging
for i, r in enumerate(vector_results):
r['id'] = r['text'][:80]
for i, r in enumerate(bm25_results):
r['id'] = r['text'][:80]
merged = reciprocal_rank_fusion([vector_results, bm25_results])
return [m['data'] for m in merged[:final_k]]
Lesson 7: Reranking:
def rerank_with_bedrock(query, documents, top_k=5):
"""Rerank retrieved documents using Amazon Bedrock Reranker."""
bedrock_agent = boto3.client('bedrock-agent-runtime', region_name='us-east-1')
response = bedrock_agent.rerank(
rerankingConfiguration={
"type": "BEDROCK_RERANKING_MODEL",
"bedrockRerankingConfiguration": {
"modelConfiguration": {
"modelArn": "arn:aws:bedrock:us-east-1::foundation-model/amazon.rerank-v1:0"
},
"numberOfResults": top_k
}
},
sources=[
{
"type": "INLINE",
"inlineDocumentSource": {
"type": "TEXT",
"textDocument": {"text": doc['text']}
}
}
for doc in documents
],
textSources=[{"type": "QUERY", "query": {"text": query}}]
)
return [
{
"text": documents[item['index']]['text'],
"rerank_score": item['relevanceScore'],
"original_rank": item['index']
}
for item in sorted(response['rerankingResults'], key=lambda x: x['relevanceScore'], reverse=True)
]
Lesson 8: Contextual Retrieval (Anthropic Technique):
def add_chunk_context(chunk, full_document, model_id='anthropic.claude-3-haiku-20240307-v1:0'):
"""Prepend contextual summary to each chunk before embedding.
Significantly improves retrieval accuracy by giving chunks document context.
"""
prompt = f"""Here is a document:
<document>
{full_document[:3000]} <!-- Truncate to avoid context overflow -->
</document>
Here is one chunk from this document:
<chunk>
{chunk}
</chunk>
Write a short 1-2 sentence context for this chunk that explains where it fits within the full document.
Return ONLY the context sentences, no preamble."""
body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 100,
"temperature": 0.0,
"messages": [{"role": "user", "content": prompt}]
}
response = bedrock_client.invoke_model(
modelId=model_id,
contentType='application/json',
accept='application/json',
body=json.dumps(body)
)
context = json.loads(response['body'].read())['content'][0]['text']
return f"{context}\n\n{chunk}" # Prepend context to chunk
# Bedrock Knowledge Bases (Managed RAG)
bedrock_agent_runtime = boto3.client('bedrock-agent-runtime', region_name='us-east-1')
def kb_retrieve_and_generate(query, knowledge_base_id, model_arn):
"""Use Bedrock Knowledge Bases for fully managed RAG."""
response = bedrock_agent_runtime.retrieve_and_generate(
input={"text": query},
retrieveAndGenerateConfiguration={
"type": "KNOWLEDGE_BASE",
"knowledgeBaseConfiguration": {
"knowledgeBaseId": knowledge_base_id,
"modelArn": model_arn,
"retrievalConfiguration": {
"vectorSearchConfiguration": {"numberOfResults": 10}
}
}
}
)
return {
"answer": response['output']['text'],
"citations": response.get('citations', [])
}
RAG Pipeline Decision Tree:
Do you want managed infrastructure?
YES → Use Bedrock Knowledge Bases (S3 → auto-chunked → OpenSearch → managed query)
NO → Build custom pipeline:
Do you have multilingual content? → Use Cohere Embed
Do you have English-only content? → Use Amazon Titan Embed V2
Do queries use exact keywords? → Add BM25 + hybrid search
Do you have high retrieval quality requirements? → Add reranking
Do you have large documents? → Use contextual retrieval
Best Practices
- Retrieve more than needed (top-20 or top-50), then rerank to final top-5
- Keep total context under 10K tokens to avoid attention degradation
- Include source metadata (doc title, page, section) in retrieved chunks
- Use hybrid search (vector + BM25) by default — pure vector misses exact terms
- Apply contextual retrieval for large document corpora (10K+ chunks)
- For production: Bedrock Knowledge Bases reduces ops overhead significantly
- Test retrieval quality independently from generation quality
MODULE 7: Features of Claude
Lesson Coverage
- Extended thinking (multi-step reasoning)
- Image support (vision)
- PDF support
- Citations
- Prompt caching rules and mechanics
- Practical implementation
Key Notes
Lesson 1: Extended Thinking:
# Extended thinking allows Claude to reason before answering
# Adds "thinking" content blocks to the response
# Use for: math, logic puzzles, multi-step planning, code analysis
body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 16000, # Must be > budget_tokens
"thinking": {
"type": "enabled",
"budget_tokens": 10000 # Max tokens for reasoning (1024–32000)
},
"messages": [{"role": "user", "content": "Solve: A train leaves at 60mph..."}]
}
response = json.loads(client.invoke_model(
modelId='anthropic.claude-3-5-sonnet-20241022-v2:0',
contentType='application/json',
accept='application/json',
body=json.dumps(body)
)['body'].read())
for block in response['content']:
if block['type'] == 'thinking':
print("REASONING:", block['thinking']) # Claude's internal reasoning
elif block['type'] == 'text':
print("ANSWER:", block['text']) # Final answer
Extended Thinking Rules:
max_tokensmust exceedbudget_tokens(max_tokens = budget_tokens + expected_answer_tokens)- Temperature is fixed at 1 when thinking is enabled (cannot customize)
- Cannot use
top_portop_kwith thinking - Streaming works with thinking — you get thinking deltas in stream
- Model decides how much of the budget to actually use
Lesson 2: Vision (Image Input):
import base64
def invoke_with_image(image_path, text_prompt, media_type='image/png'):
with open(image_path, 'rb') as f:
image_data = base64.b64encode(f.read()).decode('utf-8')
body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 1024,
"messages": [{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": media_type, # image/jpeg, image/png, image/gif, image/webp
"data": image_data
}
},
{
"type": "text",
"text": text_prompt
}
]
}]
}
response = client.invoke_model(
modelId='anthropic.claude-3-5-sonnet-20241022-v2:0',
contentType='application/json',
accept='application/json',
body=json.dumps(body)
)
return json.loads(response['body'].read())['content'][0]['text']
# Multiple images in one request
def compare_images(image_path1, image_path2):
def load_image(path):
with open(path, 'rb') as f:
return base64.b64encode(f.read()).decode('utf-8')
body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 1024,
"messages": [{
"role": "user",
"content": [
{"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": load_image(image_path1)}},
{"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": load_image(image_path2)}},
{"type": "text", "text": "Compare these two images. What are the main differences?"}
]
}]
}
# ... invoke_model ...
Vision Limits on Bedrock:
- Max 5MB per image
- Max 20 images per request
- Supported: JPEG, PNG, GIF, WebP
- Each image consumes input tokens (cost varies by resolution)
- Images are not stored — send fresh each request (no caching of images)
Lesson 3: PDF Support:
import base64
def invoke_with_pdf(pdf_path, question):
with open(pdf_path, 'rb') as f:
pdf_data = base64.b64encode(f.read()).decode('utf-8')
body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 2048,
"messages": [{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "base64",
"media_type": "application/pdf",
"data": pdf_data
}
},
{
"type": "text",
"text": question
}
]
}]
}
response = client.invoke_model(
modelId='anthropic.claude-3-5-sonnet-20241022-v2:0',
contentType='application/json',
accept='application/json',
body=json.dumps(body)
)
return json.loads(response['body'].read())['content'][0]['text']
answer = invoke_with_pdf('quarterly_report.pdf', 'What was the total revenue in Q3?')
Lesson 4: Citations:
# Citations make Claude reference the documents it used in its answer
# Requires documents to be passed as document blocks (not plain text in content)
def rag_with_citations(question, documents):
content = []
# Add each document as a document block
for doc in documents:
content.append({
"type": "document",
"source": {
"type": "text",
"media_type": "text/plain",
"data": doc['text']
},
"title": doc.get('title', 'Document'),
"context": doc.get('source', ''), # Optional source context
"citations": {"enabled": True}
})
content.append({"type": "text", "text": question})
body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 2048,
"messages": [{"role": "user", "content": content}]
}
response = client.invoke_model(
modelId='anthropic.claude-3-5-sonnet-20241022-v2:0',
contentType='application/json',
accept='application/json',
body=json.dumps(body)
)
result = json.loads(response['body'].read())
# Parse citations from response
for block in result['content']:
if block['type'] == 'text':
print("Answer:", block['text'])
# Citation blocks reference document indices
Lesson 5 & 6: Prompt Caching:
# Prompt caching saves cost for repeated requests with identical prefix
# Cache TTL: 5 minutes (ephemeral) — must resend within 5 min to get cache hit
# Cache write: +25% cost on first write
# Cache read: -90% cost (only 10% of normal input price)
# CACHE THE SYSTEM PROMPT (most common use case)
body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 1024,
"system": [
{
"type": "text",
"text": "<long static instructions, documents, examples...>",
"cache_control": {"type": "ephemeral"} # Cache this prefix
}
],
"messages": [{"role": "user", "content": "Question 1"}]
}
# CACHE MID-CONVERSATION (large document in first user message)
body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 1024,
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "<very long document to analyze>",
"cache_control": {"type": "ephemeral"} # Cache doc
},
{
"type": "text",
"text": "What is the main theme?" # Dynamic question
}
]
}
]
}
# Check cache usage in response
result = json.loads(response['body'].read())
usage = result['usage']
# usage keys: input_tokens, output_tokens,
# cache_creation_input_tokens (tokens written to cache)
# cache_read_input_tokens (tokens read from cache)
Cache Placement Rules:
System prompt cache → put cache_control on last system text block
Message cache → put cache_control on the last static content block in messages
Multiple cache points → up to 4 simultaneous cache breakpoints
Cache key = everything BEFORE the cache_control marker (must be identical to hit)
When to Cache:
- Large static system prompts (>1K tokens) reused across requests
- Long documents analyzed multiple times (PDF, transcript, codebase)
- Few-shot example banks shared across thousands of requests
- Tool definitions that rarely change
Best Practices
- Use extended thinking only for complex reasoning tasks — it adds latency and cost
- Cache static content that is reused across multiple requests in the same 5-minute window
- Resize images before sending — each image costs tokens based on resolution (smaller = cheaper)
- Enable citations when building RAG apps where source attribution matters for trust
- Test prompt caching by checking
cache_read_input_tokensin response usage - PDFs count all pages toward context window — split very large PDFs for long documents
MODULE 8: Model Context Protocol (MCP)
Lesson Coverage
- MCP introduction and motivation
- MCP client setup
- Project configuration (claude_desktop_config.json)
- Tool definition in MCP servers
- Server inspection with MCP Inspector
- Client implementation (connecting to MCP server)
- MCP Resources (data exposure)
- MCP Prompts (templated interactions)
- Review and patterns
Key Notes
MCP Overview:
Problem: Every app needs to build its own tool integrations
Solution: MCP standardizes how AI apps connect to external tools/data
MCP Architecture:
┌──────────────┐ MCP Protocol ┌──────────────┐ External APIs
│ MCP Client │◄──────────────────►│ MCP Server │◄──────────────►
│ (Claude app) │ stdio or HTTP │ (your code) │ Databases
└──────────────┘ └──────────────┘ File system
Three MCP Primitives:
| Primitive | Purpose | Analogous To |
|---|---|---|
| Tools | Functions Claude can call | Tool use / function calling |
| Resources | Data/files Claude can read | Context injection |
| Prompts | Reusable prompt templates | Prompt library |
Lesson 4: Building an MCP Server with FastMCP:
# pip install mcp fastmcp
from mcp.server.fastmcp import FastMCP
import json
import requests
mcp = FastMCP("bedrock-tools-server")
# --- TOOLS ---
@mcp.tool()
def get_weather(city: str, unit: str = "celsius") -> str:
"""Get current weather for a city.
Args:
city: City name (e.g. 'New York', 'London')
unit: Temperature unit — 'celsius' or 'fahrenheit'
"""
# In real code, call a weather API
return json.dumps({
"city": city,
"temperature": 22,
"unit": unit,
"condition": "partly cloudy"
})
@mcp.tool()
def search_docs(query: str, limit: int = 5) -> str:
"""Search internal documentation.
Args:
query: Search terms
limit: Maximum results to return (1-20)
"""
# Connect to your actual search backend
results = [{"title": f"Result {i}", "snippet": "..."} for i in range(limit)]
return json.dumps(results)
@mcp.tool()
def run_sql(query: str) -> str:
"""Execute a read-only SQL query on the analytics database.
Args:
query: SQL SELECT statement (no mutations allowed)
"""
if not query.strip().upper().startswith("SELECT"):
return json.dumps({"error": "Only SELECT queries are allowed"})
# Execute query...
return json.dumps({"columns": ["col1"], "rows": [[1, 2, 3]]})
# --- RESOURCES ---
@mcp.resource("config://app-settings")
def get_app_settings() -> str:
"""Current application configuration settings."""
return json.dumps({
"version": "2.1.0",
"environment": "production",
"features": {"rag": True, "caching": True}
})
@mcp.resource("docs://readme")
def get_readme() -> str:
"""Project README file contents."""
with open("README.md") as f:
return f.read()
# --- PROMPTS ---
@mcp.prompt()
def analyze_code(language: str, code: str) -> str:
"""Generate a code analysis prompt."""
return f"""Analyze this {language} code for:
1. Bugs and potential errors
2. Security vulnerabilities
3. Performance issues
4. Code style and readability
Code:
```{language}
{code}
Provide specific, actionable feedback."""
if name == “main”: mcp.run() # Starts stdio server by default
**Lesson 2 & 3: Client Setup and Configuration:**
```json
// claude_desktop_config.json — for Claude Desktop
{
"mcpServers": {
"bedrock-tools": {
"command": "python",
"args": ["/path/to/my-mcp-server.py"],
"env": {
"AWS_DEFAULT_REGION": "us-east-1"
}
},
"filesystem": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/allowed/path"]
}
}
}
Transport Types:
| Transport | How | Use Case |
|---|---|---|
| stdio | Child process stdin/stdout | Local servers (most common) |
| HTTP/SSE | Server-Sent Events over HTTP | Remote servers, shared infrastructure |
| WebSocket | Bidirectional WS | Real-time bidirectional |
Lesson 5: MCP Inspector:
# Install and run MCP Inspector
npx @modelcontextprotocol/inspector python my-server.py
# Inspector shows:
# - All tools with their schemas
# - All resources with their URIs
# - All prompts with their templates
# - Interactive tool call tester
# - Server logs in real time
Lesson 6: Custom MCP Client:
import asyncio
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
async def run_mcp_client():
"""Connect to an MCP server and call its tools."""
server_params = StdioServerParameters(
command="python",
args=["my-mcp-server.py"]
)
async with stdio_client(server_params) as (read, write):
async with ClientSession(read, write) as session:
# Initialize connection
await session.initialize()
# List available tools
tools_response = await session.list_tools()
print("Available tools:")
for tool in tools_response.tools:
print(f" - {tool.name}: {tool.description}")
# List resources
resources_response = await session.list_resources()
for resource in resources_response.resources:
print(f" Resource: {resource.uri}")
# Call a tool
result = await session.call_tool("get_weather", {"city": "Seattle"})
print("Weather result:", result.content[0].text)
# Read a resource
resource_content = await session.read_resource("config://app-settings")
print("Config:", resource_content.contents[0].text)
asyncio.run(run_mcp_client())
Integrating MCP with Bedrock Claude:
# Pattern: Get tool schemas from MCP, pass to Bedrock, route calls back to MCP
async def bedrock_with_mcp(user_message, mcp_session):
"""Use MCP tools with Claude on Bedrock."""
# Get tools from MCP server
tools_list = await mcp_session.list_tools()
# Convert MCP tool schemas to Bedrock tool format
bedrock_tools = [
{
"name": tool.name,
"description": tool.description,
"input_schema": tool.inputSchema
}
for tool in tools_list.tools
]
messages = [{"role": "user", "content": user_message}]
while True:
body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 4096,
"tools": bedrock_tools,
"messages": messages
}
response = json.loads(client.invoke_model(
modelId='anthropic.claude-3-5-sonnet-20241022-v2:0',
contentType='application/json',
accept='application/json',
body=json.dumps(body)
)['body'].read())
messages.append({"role": "assistant", "content": response['content']})
if response['stop_reason'] == 'end_turn':
for block in response['content']:
if block['type'] == 'text':
return block['text']
tool_results = []
for block in response['content']:
if block['type'] == 'tool_use':
# Route call through MCP
mcp_result = await mcp_session.call_tool(block['name'], block['input'])
tool_results.append({
"type": "tool_result",
"tool_use_id": block['id'],
"content": mcp_result.content[0].text
})
messages.append({"role": "user", "content": tool_results})
Best Practices
- Name tools and resources clearly — Claude uses names and descriptions to choose them
- Validate all inputs in tool functions — Claude can send unexpected values
- Return structured error JSON (don’t raise unhandled exceptions) so Claude can adapt
- Use MCP for shared tool servers reused across multiple Claude applications
- Test with MCP Inspector before connecting to any Claude application
- Use stdio transport for local development, HTTP/SSE for production shared servers
- Keep tool descriptions under 200 characters — concise beats verbose
MODULE 9: Agents
Lesson Coverage
- Agent overview and design patterns
- Claude Code setup and actions
- MCP enhancements for agents
- Parallelization patterns
- Automated debugging agents
- Computer Use
- Agent quality principles
Key Notes
What Makes an Agent:
An agent = LLM + tools + memory + orchestration loop
↓
(runs tool → gets result → decides next action → repeats)
Lesson 1: Agent Patterns:
1. PARALLELIZATION
Orchestrator ─→ [Task A] ─┐
─→ [Task B] ─┤→ Aggregate → Final Answer
─→ [Task C] ─┘
2. SEQUENTIAL CHAINING
Input → [Agent 1: Draft] → [Agent 2: Review] → [Agent 3: Refine] → Output
3. ROUTING
Input → [Router: Classify] → BILLING → [Billing Specialist]
→ TECH → [Tech Support Specialist]
→ RETURNS → [Returns Specialist]
4. ORCHESTRATOR-SUBAGENT
[Orchestrator Agent]
├→ [Research Subagent]
├→ [Writing Subagent]
└→ [Fact-Check Subagent]
Lesson 4: Parallelization Pattern:
import concurrent.futures
import boto3, json
client = boto3.client('bedrock-runtime', region_name='us-east-1')
MODEL_ID = 'anthropic.claude-3-5-sonnet-20241022-v2:0'
def run_claude(system, user_message, max_tokens=2048):
"""Single Claude call."""
body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": max_tokens,
"system": system,
"messages": [{"role": "user", "content": user_message}]
}
response = client.invoke_model(
modelId=MODEL_ID,
contentType='application/json',
accept='application/json',
body=json.dumps(body)
)
return json.loads(response['body'].read())['content'][0]['text']
def parallel_research(topic, aspects):
"""Research multiple aspects of a topic in parallel."""
def research_aspect(aspect):
system = f"You are a research expert. Focus only on: {aspect}"
return {
"aspect": aspect,
"findings": run_claude(system, f"Research: {topic} - specifically: {aspect}")
}
with concurrent.futures.ThreadPoolExecutor(max_workers=len(aspects)) as executor:
futures = [executor.submit(research_aspect, a) for a in aspects]
results = [f.result() for f in futures]
# Synthesize in orchestrator
synthesis_input = "\n\n".join([
f"## {r['aspect']}\n{r['findings']}" for r in results
])
return run_claude(
"You synthesize research into coherent summaries.",
f"Synthesize these research findings about {topic}:\n\n{synthesis_input}"
)
# Usage
summary = parallel_research("quantum computing", [
"current hardware state",
"software ecosystems",
"commercial applications",
"timeline predictions"
])
Lesson 3: Routing Pattern:
ROUTE_SYSTEM_PROMPT = """Classify the customer request into exactly one category:
- BILLING: Payment, invoice, subscription, refund questions
- TECHNICAL: Bug reports, errors, how-to questions
- RETURNS: Return, exchange, or refund of physical products
- GENERAL: Everything else
Respond with ONLY the category name."""
SPECIALIST_SYSTEMS = {
"BILLING": "You are a billing specialist. Help customers with payment and subscription issues.",
"TECHNICAL": "You are a technical support engineer. Diagnose and resolve software issues.",
"RETURNS": "You are a returns specialist. Help customers with returns and exchanges.",
"GENERAL": "You are a helpful customer service agent."
}
def route_and_respond(customer_message):
# Step 1: Route
route = run_claude(ROUTE_SYSTEM_PROMPT, customer_message, max_tokens=20).strip()
# Step 2: Respond with specialist
specialist_system = SPECIALIST_SYSTEMS.get(route, SPECIALIST_SYSTEMS["GENERAL"])
response = run_claude(specialist_system, customer_message)
return {"route": route, "response": response}
Lesson 2: Claude Code Setup:
# Install Claude Code globally
npm install -g @anthropic-ai/claude-code
# Configure to use Bedrock (instead of direct Anthropic API)
export ANTHROPIC_API_KEY="" # Not needed for Bedrock
export CLAUDE_CODE_USE_BEDROCK=1
export AWS_REGION=us-east-1
# Or configure in ~/.claude/settings.json for persistent Bedrock use
# Launch Claude Code
claude
Bedrock Agents (Managed Service):
import boto3
bedrock_agent = boto3.client('bedrock-agent-runtime', region_name='us-east-1')
def invoke_bedrock_agent(agent_id, agent_alias_id, session_id, user_message):
"""Invoke a fully managed Bedrock Agent."""
response = bedrock_agent.invoke_agent(
agentId=agent_id,
agentAliasId=agent_alias_id,
sessionId=session_id, # Maintain session for multi-turn
inputText=user_message,
enableTrace=True # Get orchestration trace for debugging
)
# Response is a streaming body
full_response = ""
traces = []
for event in response['completion']:
if 'chunk' in event:
chunk = event['chunk']
if 'bytes' in chunk:
full_response += chunk['bytes'].decode('utf-8')
elif 'trace' in event:
traces.append(event['trace'])
return {"response": full_response, "traces": traces}
Lesson 5: Automated Debugging Agent:
def debugging_agent(error_message, codebase_context, max_iterations=5):
"""Agent that iteratively diagnoses and proposes fixes."""
tools = [
{
"name": "search_codebase",
"description": "Search codebase for relevant code",
"input_schema": {
"type": "object",
"properties": {
"query": {"type": "string"},
"file_pattern": {"type": "string", "description": "Optional glob pattern"}
},
"required": ["query"]
}
},
{
"name": "read_file",
"description": "Read contents of a specific file",
"input_schema": {
"type": "object",
"properties": {"path": {"type": "string"}},
"required": ["path"]
}
},
{
"name": "run_test",
"description": "Run specific test to verify fix",
"input_schema": {
"type": "object",
"properties": {"test_name": {"type": "string"}},
"required": ["test_name"]
}
}
]
system = """You are a debugging expert. Analyze errors systematically:
1. Understand the error message
2. Search for relevant code
3. Form a hypothesis
4. Verify with tests
5. Propose a concrete fix"""
messages = [{
"role": "user",
"content": f"Error: {error_message}\n\nContext: {codebase_context}"
}]
iteration = 0
while iteration < max_iterations:
iteration += 1
body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 4096,
"system": system,
"tools": tools,
"messages": messages
}
response = json.loads(client.invoke_model(
modelId=MODEL_ID, contentType='application/json',
accept='application/json', body=json.dumps(body)
)['body'].read())
messages.append({"role": "assistant", "content": response['content']})
if response['stop_reason'] == 'end_turn':
return response['content'][0]['text']
tool_results = []
for block in response['content']:
if block['type'] == 'tool_use':
result = dispatch_debug_tool(block['name'], block['input'])
tool_results.append({
"type": "tool_result",
"tool_use_id": block['id'],
"content": json.dumps(result)
})
messages.append({"role": "user", "content": tool_results})
return "Debugging agent reached max iterations"
Lesson 6: Computer Use:
# Computer Use tools for desktop automation
# Available on claude-3-5-sonnet via Bedrock
# ALWAYS use in sandboxed/VM environment — never on production machines
computer_use_tools = [
{
"type": "computer_20241022",
"name": "computer",
"display_width_px": 1920,
"display_height_px": 1080,
"display_number": 1
},
{
"type": "text_editor_20241022",
"name": "str_replace_editor"
},
{
"type": "bash_20241022",
"name": "bash"
}
]
body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 4096,
"tools": computer_use_tools,
"messages": [{
"role": "user",
"content": "Take a screenshot and describe what's on the screen."
}]
}
# Computer Use responses include tool_use blocks with actions like:
# {"type": "screenshot"}
# {"type": "mouse_move", "coordinate": [x, y]}
# {"type": "left_click", "coordinate": [x, y]}
# {"type": "type", "text": "Hello World"}
# {"type": "key", "text": "ctrl+c"}
Lesson 7: Agent Quality Principles:
Good Agent Design:
✓ Bounded loops — always set max_iterations
✓ Audit logging — log every tool call and result
✓ Graceful error handling — tools return errors, agent adapts
✓ Human-in-the-loop — pause for approval before destructive actions
✓ Idempotency — tools should be safe to retry
✓ Minimal permissions — only grant access the agent actually needs
✓ Monitoring — alert on loops, failures, and unusual tool call patterns
Bad Agent Design:
✗ No iteration limit — can run forever on bad state
✗ No logging — can't debug what happened
✗ Silent failures — agent assumes success and continues incorrectly
✗ Unrestricted access — agent can delete/modify anything
✗ Trusting model output blindly — validate inputs before execution
Best Practices
- Keep agent loops bounded — always set max iterations to prevent runaway behavior
- Log every tool call and result for debugging and audit trails
- Use Bedrock Agents for production managed orchestration with minimal code
- Build custom loops only when you need full control over orchestration logic
- Restrict Computer Use to isolated, sandboxed VMs — never use on production systems
- Design tools to be idempotent where possible — safe to retry
- Implement human-in-the-loop checkpoints for high-stakes actions (deletions, payments)
- Use parallelization for independent subtasks to reduce total wall-clock time
- Route to specialist agents/prompts for domain-specific quality improvement
MODULE 10: Final Assessment
Key Areas to Review
Must Know for Exam:
- Bedrock client:
boto3.client('bedrock-runtime', region_name='us-east-1') - Required field:
"anthropic_version": "bedrock-2023-05-31"in every request body - Invoke methods:
invoke_model()(sync) andinvoke_model_with_response_stream()(streaming) - Response parsing:
response['body'].read()→json.loads()→['content'][0]['text'] - Tool use message flow: tool_use block → execute → tool_result in user message
- Tool result goes in a USER message, NOT an assistant message
- Prompt caching:
"cache_control": {"type": "ephemeral"}on content blocks - Cache TTL: 5 minutes
- Extended thinking:
"thinking": {"type": "enabled", "budget_tokens": N} - Cross-region IDs: prefix
us.oreu.beforeanthropic.claude-... - IAM permission:
bedrock:InvokeModelandbedrock:InvokeModelWithResponseStream
Common Exam Traps
| Trap | Correct Answer |
|---|---|
Forgetting anthropic_version | Always include "bedrock-2023-05-31" |
| Using raw model name in modelId | Must include full ID with version suffix: anthropic.claude-3-5-sonnet-20241022-v2:0 |
| Assuming cross-region is automatic | Must explicitly use prefixed profile ID: us.anthropic.claude-... |
| Reading response body twice | StreamingBody can only be read once — call .read() once, store result |
| Putting tool_result in assistant message | Tool results go in role: "user" messages |
Thinking cache_control is permanent | Cache TTL = 5 minutes (ephemeral only) |
| Setting temperature with extended thinking | Temperature is fixed at 1 when thinking is enabled |
| Assuming max_tokens = actual output length | It’s a ceiling, not a target |
Using anthropic SDK with Bedrock | Use boto3 — not the Anthropic Python library |
Quick Reference: Request Template
import boto3, json
client = boto3.client('bedrock-runtime', region_name='us-east-1')
body = {
"anthropic_version": "bedrock-2023-05-31", # Required
"max_tokens": 1024, # Required
"system": "...", # Optional
"messages": [
{"role": "user", "content": "..."}
],
# Optional parameters:
"temperature": 0.7,
"top_p": 0.9,
"stop_sequences": ["###"],
"tools": [...],
"tool_choice": {"type": "auto"},
"thinking": {"type": "enabled", "budget_tokens": 5000},
"stream": True # Only for invoke_model_with_response_stream
}
response = client.invoke_model(
modelId='anthropic.claude-3-5-sonnet-20241022-v2:0',
contentType='application/json',
accept='application/json',
body=json.dumps(body)
)
result = json.loads(response['body'].read())
text = result['content'][0]['text']
Certification Readiness Checklist
MODULE 1 — Course Introduction:
- Know all Claude model IDs available on Bedrock and their use cases
- Understand difference between standard and cross-region inference profile IDs
- Know how to enable model access in Bedrock console
- Know required IAM permissions for Bedrock
MODULE 2 — Working with the API:
- Can create boto3 bedrock-runtime client with proper credentials
- Know request body structure (anthropic_version, max_tokens, messages)
- Can parse invoke_model response (body.read → json.loads → content[0].text)
- Can implement multi-turn conversation history management
- Can implement streaming with invoke_model_with_response_stream
- Know all streaming event types (message_start, content_block_delta, message_stop)
- Know temperature, top_p, top_k, stop_sequences parameters
- Can implement retry logic with exponential backoff
MODULE 3 — Prompt Evaluations:
- Can design evaluation dataset (size, coverage, no leakage)
- Know multiple grading approaches (exact match, regex, JSON schema, model-based)
- Can implement model-based grader using cheap model (Haiku)
- Can run parallel evaluations with ThreadPoolExecutor
- Know key metrics: accuracy, F1, latency, cost per correct answer
MODULE 4 — Prompt Engineering:
- Know XML tag structure for complex prompts
- Can implement few-shot prompting with proper format matching
- Know chain of thought patterns (thinking tags, step-by-step)
- Know temperature guidance by task type
- Can use stop sequences for output extraction
MODULE 5 — Tool Use:
- Can define tools with JSON Schema (properties, types, required, enum)
- Know tool use message flow (request → tool_use → tool_result → response)
- Can implement complete tool loop (handle stop_reason=tool_use)
- Know all tool_choice options (auto, any, tool, none)
- Can handle parallel tool calls (multiple tool_use blocks)
- Know Bedrock batch inference (create_model_invocation_job, S3 I/O)
MODULE 6 — RAG:
- Know chunking strategies (fixed, recursive, overlapping) and tradeoffs
- Can invoke Amazon Titan Embed V2 for embeddings
- Can implement cosine similarity search
- Can implement BM25 keyword search with rank_bm25
- Can implement hybrid search with RRF
- Know reranking with amazon.rerank-v1:0
- Know Bedrock Knowledge Bases API (retrieve_and_generate)
- Know contextual retrieval technique
MODULE 7 — Features of Claude:
- Can enable extended thinking with budget_tokens
- Know thinking block constraints (no temperature control, max_tokens requirement)
- Can send images as base64 content blocks
- Can send PDFs as base64 document blocks
- Can enable citations in document-based requests
- Can implement prompt caching with cache_control
- Know cache TTL (5 min), write cost (+25%), read savings (-90%)
MODULE 8 — MCP:
- Know three MCP primitives (Tools, Resources, Prompts)
- Can build FastMCP server with @mcp.tool(), @mcp.resource(), @mcp.prompt()
- Know stdio vs HTTP transport options
- Can configure claude_desktop_config.json for MCP servers
- Know how to use MCP Inspector for debugging
- Can implement MCP client with ClientSession
MODULE 9 — Agents:
- Know four agent patterns (parallelization, chaining, routing, orchestrator-subagent)
- Can implement parallel subagents with ThreadPoolExecutor
- Can implement routing pattern (classify → specialist)
- Can invoke Bedrock Agents with bedrock-agent-runtime
- Know Computer Use tools and safety requirements
- Know agent quality principles (bounded loops, logging, human-in-the-loop)
MODULE 11: Course Wrap-Up
Summary
Core Difference: Bedrock vs Direct API
| Aspect | Direct Anthropic API | Amazon Bedrock |
|---|---|---|
| Auth | ANTHROPIC_API_KEY | IAM roles/credentials |
| SDK | anthropic Python library | boto3 |
| Client | anthropic.Anthropic() | boto3.client('bedrock-runtime') |
| Model IDs | claude-3-5-sonnet-20241022 | anthropic.claude-3-5-sonnet-20241022-v2:0 |
| Request method | client.messages.create() | client.invoke_model() |
| Body format | Named keyword args | JSON bytes in body param |
| Required extra field | None | "anthropic_version": "bedrock-2023-05-31" |
| Streaming | .stream() context manager | invoke_model_with_response_stream() |
| Billing | Anthropic invoice | AWS invoice |
| Managed RAG | None built-in | Bedrock Knowledge Bases |
| Managed Agents | None built-in | Bedrock Agents + Action Groups |
| Batch inference | Batch API (files API) | Model Invocation Jobs (S3 I/O) |
| Region control | Global endpoint | Per-region (must enable model access) |
| Cross-region | N/A | Inference profiles (us., eu. prefix) |
| Guardrails | None built-in | Bedrock Guardrails |
| Compliance | Anthropic ToS | AWS compliance (HIPAA-eligible, SOC2) |
Best Practices for Production on AWS:
- Use IAM roles, not static credentials — least privilege, no rotation needed in code
- Enable CloudWatch logging for Bedrock invocations (audit, cost tracking, debugging)
- Use Bedrock Guardrails to filter harmful content at managed layer
- Use cross-region inference profiles for higher availability
- Cache system prompts and static documents to reduce cost on repetitive workloads
- Use Bedrock Knowledge Bases for managed RAG (less ops); custom pipelines for full control
- Monitor token usage with AWS Cost Explorer and set billing alerts
- Pin model IDs (full ID including version) to prevent unexpected upgrades
- Use cross-region inference profiles in production for resilience
- Implement exponential backoff for ThrottlingException
Quick Comparison: Direct API vs Bedrock vs Vertex AI
| Feature | Anthropic Direct | Amazon Bedrock | Google Vertex AI |
|---|---|---|---|
| Auth | API Key | AWS IAM | Google ADC / Service Account |
| SDK | anthropic | boto3 | anthropic (AnthropicVertex) |
| Client init | Anthropic() | boto3.client('bedrock-runtime') | AnthropicVertex(region, project) |
| Model ID format | claude-3-5-sonnet-20241022 | anthropic.claude-3-5-sonnet-20241022-v2:0 | claude-3-5-sonnet@20241022 |
| Extra required | None | anthropic_version in body | None |
| Streaming | .stream() CM | invoke_model_with_response_stream() | .stream() CM |
| Billing | Anthropic | AWS | Google Cloud |
| Managed RAG | None | Bedrock Knowledge Bases | Vertex AI Search |
| Managed Agents | None | Bedrock Agents | Vertex AI Agent Builder |
| Embeddings native | None | Amazon Titan Embeddings | Vertex AI Embeddings |
| Batch inference | Batch API (files) | Model Invocation Jobs (S3) | Batch prediction jobs |
| Guardrails | None built-in | Bedrock Guardrails | Vertex AI safety filters |
| Prompt caching | Yes | Yes | Yes |
| Extended thinking | Yes | Yes | Yes |
| Computer Use | Yes | Yes | Yes |
Study tip: The most Bedrock-specific concepts that differ from other platforms are:
1. boto3 client setup with bedrock-runtime
2. JSON body encoding (not keyword args) with required anthropic_version field
3. Tool result placement in USER messages
4. Prompt cache TTL (5 minutes ephemeral)
5. Cross-region inference profile ID prefixes (us., eu.)
6. Model ID format including version suffix and anthropic. prefix