Building with the Claude API - Certification Study Guide
Table of Contents
- Building with the Claude API - Certification Study Guide
- MODULE 1: Introduction - Claude Models Overview
- MODULE 2: Accessing Claude with the API
- MODULE 3: Prompt Evaluation
- MODULE 4: Prompt Engineering Techniques
- MODULE 5: Tool Use with Claude
- MODULE 6: RAG and Agentic Search
- MODULE 7: Features of Claude
- MODULE 8: Model Context Protocol (MCP)
- MODULE 9: Anthropic Apps (Claude Code)
- MODULE 10: Agents and Workflows
- MODULE 11: Conclusion & Course Summary
- Claude Model Comparison Table
- Implementation Checklist for Certification
- Deep Dives & Advanced Topics
- Unresolved Questions / Topics for Further Study
Building with the Claude API - Certification Study Guide
Course: Anthropic - Building with the Claude API
Modules: 11 (Introduction through Agents & Workflows)
Target: Certification Prep
MODULE 1: Introduction - Claude Models Overview
Key Notes
- Claude model family: Opus (most capable), Sonnet (balanced), Haiku (fastest/cheapest)
- Model context windows: Haiku 200K, Sonnet 200K, Opus 200K tokens
- Latest stable model: claude-3-5-sonnet-20241022 (or current latest version)
- API version: Messages API is the standard (avoid legacy text completions)
- Rate limits: Vary by plan (free, pro, enterprise); track via response headers
anthropic-ratelimit-* - Pricing model: Input tokens cheaper than output tokens; cached tokens are 90% discount
- Token counting: Use
count_tokensendpoint before production calls - Supported formats: JSON, text, images (PNG, GIF, JPEG, WebP), PDFs, documents via Files API
- Authentication: API key via
ANTHROPIC_API_KEYenv var or header - Base URL:
https://api.anthropic.com/v1 - Timeout defaults: 10s for SDK (configurable)
Best Practices
- Choose model based on task complexity: Haiku for simple tasks, Sonnet for balanced, Opus for reasoning
- Always set explicit
max_tokensto avoid surprise token usage - Use
systemprompt for role/behavior, not in message history - Include version-specific features only when supported by chosen model
- Cache large context (docs, few-shots) to reduce cost
- Stream responses for better UX and token efficiency feedback
- Test with cheaper models (Haiku) before scaling to Opus
Example (Python)
import anthropic
import os
client = anthropic.Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))
# Basic request
message = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello, Claude!"}
]
)
print(f"Response: {message.content[0].text}")
print(f"Stop reason: {message.stop_reason}")
print(f"Usage: input={message.usage.input_tokens}, output={message.usage.output_tokens}")
MODULE 2: Accessing Claude with the API
Key Notes
API Authentication:
- API key location: https://console.anthropic.com/
- Header format:
x-api-key: <key> - Or:
ANTHROPIC_API_KEY=<key>environment variable - Keys are secret; rotate if exposed
- Per-project keys available in org settings
Request Structure:
- Method:
POST /v1/messages - Content-Type:
application/json - Required fields:
model,max_tokens,messages - Optional:
system,temperature,top_p,tools,tool_choice, etc.
Message Format:
messagesarray withrole(user/assistant) andcontent(string or array of blocks)rolealternates: user → assistant → user- Each turn is stateless; send full conversation history for multi-turn
Multi-Turn Conversation:
- Track message history client-side
- System prompt applies to entire conversation
- Assistant’s previous responses become assistant messages in next request
- Tool results are inserted as user messages with tool_use_id reference
System Prompt:
- Placed in separate
systemparameter, not messages array - Applies to all turns in conversation
- Can be updated per request (creates new conversation context)
- Best for role definition, rules, output format instructions
- Costs same as message tokens but cached efficiently
Temperature & Top-P:
temperature(0.0-1.0): 0=deterministic, 1=random (default 1.0)top_p(0.0-1.0): nucleus sampling, used with temperature- For structured output: temperature=0
- For creative: temperature=0.8-1.0
- Rarely combine with
top_k(deprecated in favor of top_p)
Streaming:
- Set
stream=Truein request - Returns server-sent events (SSE) with delta updates
- Stream event types:
content_block_start,content_block_delta,message_delta,message_stop - Rebuild message by accumulating deltas
- Always consume stream fully before closing connection
Structured Output:
- Use
response_formatparameter withtype: "json_schema"(if model supports) - Define JSON schema in
json_schema.schemaproperty - Model will output valid JSON matching schema
- Useful for function calls, data extraction, parsing
Best Practices
- Always include explicit
max_tokens(don’t rely on defaults) - Send full conversation history for multi-turn (Claude has no built-in memory)
- Use system prompt for all conversations with consistent instructions
- Set
temperature=0for deterministic tasks (classification, extraction) - Stream for interactive applications; collect full message for logging
- Validate JSON schema compliance in client if structured output requested
- Handle rate limiting with exponential backoff
- Set appropriate timeouts for long-running requests
Example (Python)
Basic Request:
message = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
system="You are a helpful assistant.",
messages=[
{"role": "user", "content": "What is 2+2?"}
]
)
print(message.content[0].text)
Multi-Turn Conversation:
system_prompt = "You are a helpful coding assistant."
conversation_history = []
def chat(user_input):
conversation_history.append({"role": "user", "content": user_input})
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=2048,
system=system_prompt,
messages=conversation_history
)
assistant_message = response.content[0].text
conversation_history.append({"role": "assistant", "content": assistant_message})
return assistant_message
# Multiple turns
print(chat("What is Python?"))
print(chat("How do I write a class?"))
print(chat("Can you show me an example?"))
Streaming:
print("Streaming response:")
with client.messages.stream(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
system="You are concise.",
messages=[{"role": "user", "content": "Write a haiku about AI."}]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
print()
Structured Output (JSON):
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[{"role": "user", "content": "Extract the name and age from: John is 30 years old."}],
response_format={
"type": "json_schema",
"json_schema": {
"name": "person_extractor",
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"}
},
"required": ["name", "age"]
},
"strict": True
}
}
)
import json
data = json.loads(response.content[0].text)
print(f"Name: {data['name']}, Age: {data['age']}")
MODULE 3: Prompt Evaluation
Key Notes
Evaluation Workflow:
- Define evaluation task (classification, generation, reasoning)
- Create test dataset (labeled examples)
- Run model on test cases
- Grade outputs (automated or manual)
- Compute metrics (accuracy, F1, similarity, custom)
- Analyze failures and iterate
Test Dataset Design:
- Minimum 20-50 examples for reliable signal
- Include edge cases, ambiguous inputs, common errors
- Label expected outputs for supervised evaluation
- Stratify by category if multi-class
- Version control datasets alongside prompts
Grading Methods:
Code-based grading:
- Exact match (string equality)
- Regex matching (pattern validation)
- JSON schema validation
- Custom Python function (flexible)
- Numeric thresholds
- Function-based scoring (0-1 range)
Model-based grading:
- Use Claude to grade Claude’s outputs (consistent rubric)
- Judges model: Opus 3.5 or Sonnet 3.5
- Rubric: clear criteria, examples, scoring scale
- Less code, captures semantic quality
- Slower/more expensive than code-based but more reliable
Metrics to track:
- Accuracy (% correct)
- F1 score (precision × recall)
- Token efficiency (tokens/task)
- Latency (response time)
- Cost (input + output tokens × pricing)
- User satisfaction (if collecting feedback)
Best Practices
- Start with code-based grading (fast iteration)
- Use model-based grading for subjective tasks (quality, tone, correctness)
- Separate test set from training/validation set
- Run evals on multiple model versions before deployment
- Log all eval runs with timestamps, model, prompt version
- Aim for 95%+ accuracy before production
- Document failure cases and plan improvements
- A/B test prompt changes on held-out test set
Example (Python)
import anthropic
import json
from typing import Literal
client = anthropic.Anthropic()
# Test dataset
test_cases = [
{"input": "Extract the color: The car is red.", "expected_output": "red"},
{"input": "Extract the color: She wore a blue dress.", "expected_output": "blue"},
{"input": "Extract the color: The sky is clear.", "expected_output": "no color mentioned"},
]
# Code-based grading (exact match)
def grade_exact_match(output: str, expected: str) -> bool:
return output.strip().lower() == expected.strip().lower()
# Model-based grading
def grade_with_claude(input_text: str, output: str, expected: str) -> float:
"""Use Claude as a judge: returns score 0-1."""
rubric = f"""
Task: Evaluate if the output correctly answers the input query.
Expected answer: {expected}
Actual output: {output}
Score 1.0 if correct, 0.5 if partially correct, 0.0 if incorrect.
Respond with ONLY the number (e.g., 1.0).
"""
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=10,
system="You are an expert evaluator. Score the response accurately.",
messages=[{"role": "user", "content": rubric}]
)
try:
return float(response.content[0].text.strip())
except:
return 0.0
# Run evaluation
def evaluate_prompt():
results = []
for test in test_cases:
# Get model output
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=100,
system="Extract the color mentioned. If no color, respond 'no color mentioned'.",
messages=[{"role": "user", "content": test["input"]}]
)
output = response.content[0].text
# Grade with both methods
exact_match = grade_exact_match(output, test["expected_output"])
model_score = grade_with_claude(test["input"], output, test["expected_output"])
results.append({
"input": test["input"],
"expected": test["expected_output"],
"actual": output,
"exact_match": exact_match,
"model_score": model_score
})
# Compute metrics
accuracy = sum(1 for r in results if r["exact_match"]) / len(results)
avg_model_score = sum(r["model_score"] for r in results) / len(results)
print(f"Accuracy: {accuracy:.2%}")
print(f"Average Model Score: {avg_model_score:.2f}")
print("\nDetailed Results:")
for r in results:
print(f" Input: {r['input']}")
print(f" Expected: {r['expected']}, Actual: {r['actual']}")
print(f" Match: {r['exact_match']}, Model Score: {r['model_score']}\n")
evaluate_prompt()
MODULE 4: Prompt Engineering Techniques
Key Notes
Core Principles:
Clarity: Be specific about task, not vague
- Bad: “Summarize this”
- Good: “Summarize in 3 bullet points, focusing on methodology”
Specificity: Include constraints, format, examples
- Output format (JSON, bullet points, code blocks)
- Length (words, paragraphs, tokens)
- Tone (formal, casual, technical)
- Edge cases (“If N/A, respond ’not provided’”)
XML Tags: Structure complex prompts
<task>,<context>,<rules>,<output_format>- Makes parsing easier, prevents confusion
- Claude particularly responsive to well-structured XML
Examples (Few-Shot): Dramatically improve performance
- 2-5 examples usually sufficient
- Show input-output pairs for task
- Include edge cases in examples
- More effective than long descriptions
Chain of Thought: Encourage step-by-step reasoning
- “Think step-by-step before answering”
- Improves accuracy on reasoning tasks
- Increases token usage but better results
Iterative Refinement: Test, measure, improve
- Evaluate on test set
- Identify failure patterns
- Adjust prompt, re-evaluate
- Version prompts alongside evals
Best Practices
- Separate instructions (system) from data (user messages)
- Use XML tags for multi-part instructions
- Include 2-5 diverse examples for complex tasks
- Ask for step-by-step reasoning on logic/math tasks
- Specify output format explicitly
- For creative tasks, use higher temperature; for accuracy, use temperature=0
- Test variations on same test set to measure impact
- Document what changed and why in prompt versions
Example (Python)
Clarity & Specificity:
# Bad prompt
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=200,
messages=[{"role": "user", "content": "Analyze this text."}]
)
# Good prompt with clarity and specificity
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=200,
messages=[{
"role": "user",
"content": """Analyze the following customer review for sentiment.
Output format: JSON with fields: sentiment (positive/negative/neutral), confidence (0-1), key_phrases (list of strings).
Text: "The product arrived late but works great once I set it up. Would recommend despite shipping issues."
"""
}]
)
XML Structure:
prompt = """
<task>
Extract structured information from a product review.
</task>
<rules>
- Output MUST be valid JSON
- If information is missing, use null
- Sentiment must be one of: positive, negative, neutral
- Confidence is a number 0-1
</rules>
<output_format>
{
"product_name": string,
"sentiment": string,
"confidence": number,
"pros": [string],
"cons": [string],
"rating": number or null
}
</output_format>
<examples>
Input: "Excellent phone! Fast processor, great camera. Battery life is average though."
Output: {
"product_name": null,
"sentiment": "positive",
"confidence": 0.9,
"pros": ["fast processor", "great camera"],
"cons": ["average battery life"],
"rating": null
}
</examples>
<input>
"These shoes are uncomfortable and overpriced. Not worth the hype."
</input>
"""
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=500,
messages=[{"role": "user", "content": prompt}]
)
Few-Shot Examples:
examples = [
{
"input": "The meeting is at 3 PM tomorrow.",
"output": "DATETIME: 3:00 PM tomorrow"
},
{
"input": "I have 5 apples.",
"output": "QUANTITY: 5"
},
{
"input": "The sky is blue.",
"output": "ATTRIBUTE: sky, blue"
}
]
prompt = "Extract the main entity from this sentence:\n\nThe project deadline is next Friday.\n\nUse examples:\n"
for ex in examples:
prompt += f"Input: {ex['input']}\nOutput: {ex['output']}\n\n"
prompt += "Now extract: 'The project deadline is next Friday.'"
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=100,
messages=[{"role": "user", "content": prompt}]
)
Chain of Thought:
prompt = """Solve this math problem step-by-step.
Problem: If a train travels at 60 mph for 2.5 hours, how far does it go?
Before answering, think through:
1. What formula do I need?
2. What values do I have?
3. What's the calculation?
4. What's the final answer?
"""
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=300,
temperature=0, # Deterministic for math
messages=[{"role": "user", "content": prompt}]
)
MODULE 5: Tool Use with Claude
Key Notes
Tool Use Overview:
- Claude can request function calls; you execute and return results
- Supports up to 10,000 tools per request
- Tool definitions are JSON schemas describing function signature
- Tool results fed back as user messages with special role
Tool Schema Structure:
{
"name": "function_name",
"description": "What this function does",
"input_schema": {
"type": "object",
"properties": {
"param1": {"type": "string", "description": "..."},
"param2": {"type": "number", "description": "..."}
},
"required": ["param1"]
}
}
Message Flow:
- Send tool definitions in
toolsarray - Model responds with
stop_reason: "tool_use" - Extract
tool_useblocks from response - Execute tool, collect result
- Send result back as
{"role": "user", "content": [{"type": "tool_result", "tool_use_id": "...", "content": "..."}]} - Model continues with updated information
Tool Use Block Structure:
{
"type": "tool_use",
"id": "unique_id",
"name": "tool_name",
"input": {...}
}
Tool Result Block:
{
"type": "tool_result",
"tool_use_id": "id_from_tool_use_block",
"content": "result string or error"
}
Multiple Tools:
- Can use multiple tools in single turn
- Model decides which tools to call
- All tools called in parallel, results returned together
- Specify tool order in
toolsarray to hint preference
Tool Choice Parameter:
"auto"(default): Model decides when to use tools"required": Model must use a tool in response{"type": "tool", "name": "specific_tool"}: Force specific tool"none": Model won’t use any tools
Fine-Grained Tool Calling:
- Set
tool_choice={"type": "tool", "name": "exact_tool_name"}to force specific tool - Use
"required"when tool use is essential for task - Useful for forcing function calling in agentic workflows
Text Edit Tool:
- Built-in tool for editing text/code
- Useful in agentic scenarios where Claude modifies documents
- Not directly exposed; mention if needed for advanced workflows
Web Search Tool:
- Built-in capability; can search web within tool use
- Returns snippet results with citations
- Used within tool_use blocks similar to custom tools
Best Practices
- Keep tool descriptions concise but clear
- Use descriptive parameter names and descriptions
- Set required fields only for essential parameters
- Include default values for optional parameters
- Error handling: return error message as tool result, let Claude retry
- For multi-step workflows, use tools to gather info, then summarize
- Cache tool definitions (especially for long lists) using prompt caching
- Test tool schemas with
toolsparameter before deployment - Use
tool_choice="required"to enforce function calling for APIs
Example (Python)
Define Tool Schema & Basic Flow:
import anthropic
import json
client = anthropic.Anthropic()
# Define tools
tools = [
{
"name": "get_weather",
"description": "Get current weather for a location",
"input_schema": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City or coordinates"},
"units": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "Temperature units"}
},
"required": ["location"]
}
},
{
"name": "get_time",
"description": "Get current time for a timezone",
"input_schema": {
"type": "object",
"properties": {
"timezone": {"type": "string", "description": "Timezone (e.g., 'America/New_York')"}
},
"required": ["timezone"]
}
}
]
# Simulate tool execution
def execute_tool(name: str, input_data: dict) -> str:
if name == "get_weather":
return json.dumps({
"location": input_data["location"],
"temperature": 22,
"condition": "Sunny"
})
elif name == "get_time":
return json.dumps({
"timezone": input_data["timezone"],
"time": "14:30:00"
})
return "Tool not found"
# Multi-turn tool use loop
def chat_with_tools(user_message: str):
messages = [{"role": "user", "content": user_message}]
print(f"User: {user_message}\n")
while True:
# Send request with tools
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
tools=tools,
messages=messages
)
# Check if tool use is needed
if response.stop_reason == "tool_use":
# Extract all tool use blocks
tool_uses = [block for block in response.content if block.type == "tool_use"]
# Add assistant response to messages
messages.append({"role": "assistant", "content": response.content})
# Execute tools and collect results
tool_results = []
for tool_use in tool_uses:
print(f"Tool: {tool_use.name}")
print(f"Input: {json.dumps(tool_use.input, indent=2)}")
result = execute_tool(tool_use.name, tool_use.input)
print(f"Result: {result}\n")
tool_results.append({
"type": "tool_result",
"tool_use_id": tool_use.id,
"content": result
})
# Send tool results back
messages.append({"role": "user", "content": tool_results})
elif response.stop_reason == "end_turn":
# Model finished, no more tools
final_response = next(
(block.text for block in response.content if hasattr(block, "text")),
"No response"
)
print(f"Assistant: {final_response}")
break
else:
print(f"Unexpected stop reason: {response.stop_reason}")
break
# Test
chat_with_tools("What's the weather in Paris and the time in London?")
Force Tool Use & Handle Multiple Tools:
# Force tool use
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
tools=tools,
tool_choice="required", # Model must use a tool
messages=[{"role": "user", "content": "Tell me about Paris."}]
)
# Or force specific tool
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
tools=tools,
tool_choice={"type": "tool", "name": "get_weather"}, # Force this tool
messages=[{"role": "user", "content": "What's the weather?"}]
)
# Parallel tool execution (both tools called in same turn)
def handle_parallel_tools(user_message: str):
messages = [{"role": "user", "content": user_message}]
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
tools=tools,
messages=messages
)
if response.stop_reason == "tool_use":
tool_uses = [block for block in response.content if block.type == "tool_use"]
# Execute all tools in parallel
results = []
for tool_use in tool_uses:
result = execute_tool(tool_use.name, tool_use.input)
results.append({
"type": "tool_result",
"tool_use_id": tool_use.id,
"content": result
})
# Send all results back at once
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": results})
# Continue conversation
final_response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
tools=tools,
messages=messages
)
return final_response
handle_parallel_tools("Get weather in Paris AND time in London")
MODULE 6: RAG and Agentic Search
Key Notes
RAG (Retrieval-Augmented Generation):
- Retrieves relevant documents, passes to LLM for synthesis
- Better than fine-tuning for up-to-date, dynamic content
- Solves hallucination by grounding responses in real data
RAG Pipeline Steps:
Chunking: Split documents into small pieces
- Chunk size: 300-1000 tokens typical
- Overlap: 100-200 tokens to preserve context
- Split on semantic boundaries (paragraphs, sections)
Embeddings: Convert chunks to vectors
- Use embedding model (e.g., OpenAI, Anthropic, Cohere)
- Dimension: 768-3072 typical
- Store vectors in vector DB (Pinecone, Weaviate, Milvus)
Indexing: Build search index
- Vector index for semantic search
- BM25 index for keyword search
- Hybrid search combines both
Retrieval: Find relevant chunks
- Query embedding vs document embeddings (cosine similarity)
- Top-K results (usually 3-10)
- Filtering by metadata (date, category)
Generation: Pass retrieved context to LLM
- Include original query + retrieved chunks
- Use system prompt to define task
- Claude synthesizes answer with citations
Chunking Strategy:
- Fixed size: Simple, consistent (e.g., 512 tokens)
- Semantic: Split on headers, paragraphs (preserves meaning)
- Overlapping: Maintain context across chunks
- Hierarchical: Chunks with parent/child relationships
BM25 Search:
- Keyword-based ranking algorithm
- Good for exact matches, specific terms
- Complement vector search for hybrid retrieval
- Fast, no embeddings needed
Multi-Index Search:
- Vector index: semantic similarity
- BM25 index: keyword matching
- Metadata index: filtering (date, source, category)
- Combine results with reciprocal rank fusion or learned weights
Vector DB Selection:
- Pinecone: Managed, serverless, easy to use
- Weaviate: Open-source, flexible, local/cloud
- Milvus: Open-source, high performance
- Qdrant: Rust-based, performant, similar to Milvus
Best Practices
- Chunk at semantic boundaries (paragraphs, sections), not randomly
- Use 2-3 sources of retrieval (vector + BM25 + metadata)
- Retrieve 5-10 top results; let model use most relevant
- Include source/citation metadata with chunks
- Test retrieval quality independently (check if relevant docs retrieved)
- Combine with reranking (use LLM to rerank retrieved results)
- Cache retrieved context if same query appears multiple times
- Monitor retrieval performance: measure precision@k, recall, MRR
Example (Python)
Basic RAG Flow:
import anthropic
from typing import List
client = anthropic.Anthropic()
# Simulated document store (in production: vector DB)
documents = [
{
"id": "doc1",
"text": "Python is a high-level programming language. It emphasizes readability.",
"source": "Python Basics"
},
{
"id": "doc2",
"text": "JavaScript runs in browsers and enables interactive web pages.",
"source": "Web Development"
},
{
"id": "doc3",
"text": "Python has a rich ecosystem of libraries like NumPy, Pandas, TensorFlow.",
"source": "Python Libraries"
}
]
def retrieve_documents(query: str, top_k: int = 3) -> List[str]:
"""Simple keyword-based retrieval (BM25-like)."""
query_terms = query.lower().split()
scored_docs = []
for doc in documents:
score = sum(1 for term in query_terms if term in doc["text"].lower())
if score > 0:
scored_docs.append((doc, score))
# Sort by score and return top_k
ranked = sorted(scored_docs, key=lambda x: x[1], reverse=True)
return [doc["text"] for doc, _ in ranked[:top_k]]
def rag_query(user_query: str) -> str:
"""RAG pipeline: retrieve → pass to Claude → synthesize."""
# Step 1: Retrieve relevant documents
retrieved_docs = retrieve_documents(user_query, top_k=3)
context = "\n\n".join([f"[Chunk {i+1}]\n{doc}" for i, doc in enumerate(retrieved_docs)])
# Step 2: Build prompt with context
system_prompt = """You are a helpful assistant. Answer based on the provided context.
If context doesn't contain relevant information, say so clearly."""
user_message = f"""Context:
{context}
Question: {user_query}
Answer based on the context above."""
# Step 3: Get response from Claude
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
system=system_prompt,
messages=[{"role": "user", "content": user_message}]
)
return response.content[0].text
# Test RAG
print(rag_query("What can I do with Python?"))
print("\n---\n")
print(rag_query("How does JavaScript work?"))
Multi-Index Hybrid Search:
from collections import Counter
import math
def bm25_score(doc_text: str, query_terms: List[str]) -> float:
"""Simple BM25-like scoring."""
doc_terms = doc_text.lower().split()
score = 0
for term in query_terms:
count = doc_terms.count(term)
score += math.log(1 + count)
return score
def retrieve_hybrid(query: str, top_k: int = 5):
"""Hybrid retrieval: BM25 + semantic (simulated vector)."""
query_terms = query.lower().split()
results = []
for doc in documents:
# BM25 score
bm25 = bm25_score(doc["text"], query_terms)
# Simulated vector similarity (0-1)
# In production: actual embedding cosine similarity
vector_sim = 0.8 if any(term in doc["text"].lower() for term in query_terms) else 0.2
# Combine scores (weighted average)
combined = 0.4 * bm25 / 10 + 0.6 * vector_sim # Normalize BM25
results.append((doc, combined))
# Rank and return
ranked = sorted(results, key=lambda x: x[1], reverse=True)
return [doc["text"] for doc, _ in ranked[:top_k]]
docs = retrieve_hybrid("Python programming language", top_k=2)
print("Retrieved documents (hybrid):")
for i, doc in enumerate(docs, 1):
print(f"{i}. {doc[:80]}...")
MODULE 7: Features of Claude
Key Notes
Extended Thinking:
- Enables Claude to reason in “thinking” tokens (not shown to user)
- Improves accuracy on complex reasoning, math, coding
- Costs: thinking tokens = output tokens (not discounted)
- Parameter:
thinkingwithtype="enabled"ortype="disabled" budget_tokens: max thinking tokens (default 10,000)- Response contains
thinkingblock (shown to client) +textblock
Image/PDF Support:
- Support formats: PNG, GIF, JPEG, WebP, PDF
- Images sent in
contentarray as{"type": "image", "source": {...}} - Image source:
base64,url, ormedia_type - PDFs: use Files API or base64 encode (max 20MB per file, 5 files per message)
- Vision capability included in all models
Citations:
- Claude can cite document snippets with precise locations
- Requires extracting citation data from response
- Citation format: document indices + character ranges
- Use
bblock_citationsin response headers if enabled - Useful for Q&A, document analysis, compliance
Prompt Caching:
- Cache frequently-used context (system prompt, documents, examples)
- Cached tokens charged 90% less than new tokens (10% of input token cost)
- Cache hits: reuse cached tokens without reprocessing
- Cache key: hash of request up to cache control point
- Minimum cache size: 1024 tokens to create cache
Cache Control Placement:
System prompt (usually cached)
↓
Optional long context (docs, examples, few-shot)
↓ [CACHE_CONTROL HERE]
↓
User query (not cached)
Code Execution:
- Claude can write and reason about code
- Not directly executable in API; client must run code
- Use tool use to return code for execution
- Results fed back for Claude to analyze
Files API:
- Upload documents for analysis
- Supports: PDF, DOCX, XLSX, PPTX, TXT, CSV, JSON
- File size: max 20MB
- Solves: avoid base64 encoding, handle large docs efficiently
- Reference in message:
{"type": "document", "source": {"type": "file", "file_id": "..."}}
Best Practices
- Use extended thinking for complex reasoning; measure cost vs accuracy improvement
- Cache system prompts + few-shot examples for consistent savings
- For images: include relevant metadata (describe what to look for)
- Enable citations only if compliance/audit needed (adds overhead)
- Use Files API for documents > 1MB or > 10k tokens
- Test cache hit rate; measure savings before production
- Combine caching + streaming for optimal token efficiency
- PDF handling: extract text if possible, use Files API as fallback
Example (Python)
Extended Thinking:
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=8000,
thinking={
"type": "enabled",
"budget_tokens": 5000 # Max thinking tokens
},
messages=[{
"role": "user",
"content": "Solve: If a train leaves NYC at 9 AM going 60 mph, and another leaves Boston at 10 AM going 50 mph, when do they meet?"
}]
)
# Extract thinking and response
for block in response.content:
if block.type == "thinking":
print(f"Thinking: {block.thinking}")
elif block.type == "text":
print(f"Answer: {block.text}")
Image Analysis:
import base64
# Load image as base64
with open("chart.png", "rb") as f:
image_data = base64.standard_b64encode(f.read()).decode("utf-8")
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_data
}
},
{
"type": "text",
"text": "What trends do you see in this chart?"
}
]
}]
)
print(response.content[0].text)
Prompt Caching:
system_prompt = """You are an expert analyst. Answer questions about the provided documents accurately."""
# Long context to cache (e.g., a large document)
cached_document = """
[Large document with thousands of tokens...]
Company History: Founded in 1990, specialized in cloud infrastructure.
Product Features: Load balancing, auto-scaling, monitoring, security.
Pricing: $99/month basic, $299/month pro, custom enterprise.
[... continues for many tokens ...]
"""
# Request with caching
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
system=[
{
"type": "text",
"text": system_prompt
},
{
"type": "text",
"text": cached_document,
"cache_control": {"type": "ephemeral"} # Cache this content
}
],
messages=[{
"role": "user",
"content": "What is the pricing for the basic plan?"
}]
)
# Check cache usage
print(f"Input tokens (new): {response.usage.input_tokens}")
print(f"Cache creation tokens: {response.usage.cache_creation_input_tokens}")
print(f"Cache read tokens: {response.usage.cache_read_input_tokens}")
# Second request reuses cache
response2 = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
system=[
{
"type": "text",
"text": system_prompt
},
{
"type": "text",
"text": cached_document,
"cache_control": {"type": "ephemeral"}
}
],
messages=[{
"role": "user",
"content": "What are the main product features?"
}]
)
print(f"\nSecond request:")
print(f"Cache read tokens (reused): {response2.usage.cache_read_input_tokens}")
# Expected: much lower cost due to cache hit
Files API:
# Upload a document
import os
with open("report.pdf", "rb") as f:
file_response = client.beta.files.upload(
file=(os.path.basename("report.pdf"), f, "application/pdf"),
betas=["files-api-2025-04-14"]
)
file_id = file_response.id
print(f"Uploaded file: {file_id}")
# Use file in message
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=2048,
messages=[{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "file",
"file_id": file_id
},
"title": "Q1 Report"
},
{
"type": "text",
"text": "Summarize the key financial metrics."
}
]
}],
betas=["files-api-2025-04-14"]
)
print(response.content[0].text)
# Cleanup
client.beta.files.delete(file_id, betas=["files-api-2025-04-14"])
MODULE 8: Model Context Protocol (MCP)
Key Notes
MCP Overview:
- Protocol for LLMs to interact with external tools, data, APIs
- Client (Claude) requests resources/tools from MCP server
- Server provides tools, resources (files, databases), and prompts
- Bidirectional communication over stdio, SSE, or HTTP
MCP Architecture:
Claude Client (LLM)
↓
MCP Client (proxy)
↓ [MCP Protocol]
↓
MCP Server (e.g., database, API)
Key Components:
Tools: Functions MCP server exposes
- Schema: name, description, input parameters
- Claude calls tools, server executes, returns result
Resources: Data/files MCP server provides
- Can be read, written, updated
- Examples: database records, files, API responses
- Use URI scheme (e.g.,
file://,db://)
Prompts: Contextual instructions from server
- Customize Claude’s behavior per server
- Include guidelines, examples, constraints
Server Inspector:
- Tool to discover MCP server capabilities
- List all tools, resources, prompts available
- Test tool execution
- Useful for debugging and documentation
MCP Protocol Details:
- Request/response pattern (RPC-like)
- Transport: stdio (local), SSE (HTTP), or WebSocket
- Initialization: client sends list of MCP versions, server responds with capabilities
- Error handling: structured error responses with codes
Best Practices
- Version MCP servers; indicate breaking changes
- Keep tool/resource schemas concise and clear
- Use descriptive names for tools and resources
- Implement proper error handling and logging
- Test server with server inspector before deployment
- Document resource URIs and tool parameters thoroughly
- Cache MCP server responses if repeated calls expected
Example (Python)
Simple MCP Server Definition (conceptual):
# MCP servers are typically implemented in languages with good stdlib support
# Here's a conceptual example of what an MCP server provides
mcp_server_tools = [
{
"name": "get_user_by_id",
"description": "Retrieve user information by ID",
"input_schema": {
"type": "object",
"properties": {
"user_id": {"type": "integer", "description": "User ID"}
},
"required": ["user_id"]
}
},
{
"name": "create_task",
"description": "Create a new task for a user",
"input_schema": {
"type": "object",
"properties": {
"user_id": {"type": "integer"},
"title": {"type": "string"},
"description": {"type": "string"},
"due_date": {"type": "string", "format": "date"}
},
"required": ["user_id", "title"]
}
}
]
mcp_resources = [
{
"uri": "database://users",
"name": "Users Table",
"description": "All user records",
"mimeType": "application/json"
},
{
"uri": "database://tasks",
"name": "Tasks Table",
"description": "All tasks",
"mimeType": "application/json"
}
]
# Claude would interact with this MCP server
# to call tools and access resources
Using MCP in Claude API Calls (with MCP client):
# In production, configure MCP servers in Claude Code settings
# The MCP client proxy handles the protocol
# Within Claude conversation, tools from MCP server become available
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
tools=[
# MCP tools are injected here by Claude Code harness
# Example MCP tool from server:
{
"name": "get_user_by_id",
"description": "Retrieve user information by ID",
"input_schema": {
"type": "object",
"properties": {
"user_id": {"type": "integer"}
},
"required": ["user_id"]
}
}
],
messages=[{
"role": "user",
"content": "Tell me about user 123."
}]
)
# Claude calls tool, server executes, result returned
MODULE 9: Anthropic Apps (Claude Code)
Key Notes
Claude Code Overview:
- VS Code extension enabling AI-assisted development
- Integrates Claude API with code editor
- Features: code generation, completion, refactoring, debugging, testing
- Connected to MCP servers for external tool access
Setup:
- Install Claude Code extension from VS Code marketplace
- Authenticate with Anthropic API key
- Configure in
settings.json(project-specific) andsettings.local.json(machine-specific) - Hooks auto-load from
.claude/hooks/for automation
MCP Server Integration:
- Configure MCP servers in
.claude/settings.json - Each server: name, path/URL, environment variables
- Claude Code automatically instantiates and connects to servers
- Tools/resources from MCP servers available in chat
Hooks (Automation):
- SessionStart: Run before each session starts
- PreToolUse: Run before Claude uses a tool
- PostToolUse: Run after tool execution
- UserPromptSubmit: Run before user input processed
- Enable loop detection, build monitoring, token efficiency tracking
Rules & Docs:
.claude/rules/— auto-loaded guidelines for every session.claude/docs/— reference material (architecture, standards, patterns).claude/skills/— domain-specific capabilities (research, planning, code review)
Best Practices
- Keep hooks lightweight (no heavy computation)
- Use rules for enforcing standards, not for task-specific instructions
- Organize MCP servers by domain (database, API, file system)
- Version control
.claude/configuration across team - Use hooks for CI/CD integration, token monitoring, privacy enforcement
- Document custom hooks and MCP servers in
.claude/docs/
Example (Settings)
.claude/settings.json (sample):
{
"mcp_servers": {
"filesystem": {
"command": "node",
"args": ["./mcp-servers/filesystem.js"],
"env": {
"ALLOWED_PATHS": "/project/src,/project/docs"
}
},
"database": {
"command": "python",
"args": ["./mcp-servers/database.py"],
"env": {
"DB_URL": "postgresql://localhost:5432/mydb"
}
}
},
"hooks": {
"SessionStart": "node .claude/hooks/token-monitor.cjs",
"PreToolUse": "node .claude/hooks/tool-validator.cjs",
"PostToolUse": "node .claude/hooks/result-logger.cjs"
}
}
MODULE 10: Agents and Workflows
Key Notes
Agents vs Workflows:
Agents:
- Autonomous, goal-driven systems
- Use tools to achieve objectives
- Self-directed task planning and execution
- Error recovery and retry logic
- Examples: research agent, coding assistant, data analyst
Workflows:
- Orchestrated sequences of steps
- Fixed flow, deterministic routing
- Human-in-the-loop decision points
- Examples: approval pipelines, data pipelines, CI/CD chains
Parallelization:
- Execute independent tasks simultaneously
- No dependencies between tasks
- Speed up overall execution
- Example: retrieve 3 data sources in parallel, then synthesize
Chaining:
- Sequential task execution with dependencies
- Output of task N feeds into task N+1
- Used for multi-step workflows
- Example: research → design → implement → test
Routing:
- Branch logic based on conditions
- Route to different tasks based on input/output
- Used for decision-making agents
- Example: IF complex_issue THEN escalate ELSE resolve
Agentic Patterns:
Loop Agent:
- Perceive → Plan → Act → Repeat
- Check goal achievement, loop until done
- Tool use at each iteration
Router Agent:
- Classify input
- Route to specialized agent/tool
- Collect and synthesize results
Delegator:
- Break task into subtasks
- Delegate to sub-agents
- Aggregate results
Implementation Approaches:
Scheduled Agents (with schedule tool):
- Run on cron schedule or at specific time
- Good for monitoring, cleanup, reports
- Managed by harness
Task Agents (ad-hoc):
- Spawn when triggered by user/event
- Run to completion then exit
- Used for one-off work
Loop Agents (persistent):
- Long-running, check conditions periodically
- Monitor/polling patterns
- Use Monitor tool for streaming events
Best Practices
- Parallelization: Identify independent tasks; spawn simultaneously; collect results
- Chaining: Use task outputs as inputs to next; handle failures gracefully
- Routing: Define clear decision criteria; ensure all routes have handlers
- Agent Communication: Use file-based or API-based messaging between agents
- Error Handling: Implement retry logic, fallback options, error logging
- Monitoring: Log agent execution, measure latency, success rates
- Testing: Test agent in isolation, then in composition
- Scaling: Use queue systems (Bull, RabbitMQ) for high-volume agent execution
Example (Python & Pseudocode)
Parallel Execution:
import anthropic
import concurrent.futures
client = anthropic.Anthropic()
def research_topic(topic: str) -> str:
"""Research subtask: get info on topic."""
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=500,
messages=[{
"role": "user",
"content": f"Research and summarize: {topic}"
}]
)
return response.content[0].text
def parallel_research(main_topic: str, subtopics: list) -> dict:
"""Execute research in parallel, synthesize results."""
# Parallelize
with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
futures = {
executor.submit(research_topic, subtopic): subtopic
for subtopic in subtopics
}
results = {}
for future in concurrent.futures.as_completed(futures):
subtopic = futures[future]
results[subtopic] = future.result()
# Synthesize
synthesis_prompt = f"""
Given these research summaries on {main_topic}:
{chr(10).join(f'{topic}: {result}' for topic, result in results.items())}
Create a comprehensive summary combining all insights.
"""
synthesis = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[{"role": "user", "content": synthesis_prompt}]
)
return {
"subtopic_results": results,
"synthesis": synthesis.content[0].text
}
# Execute
result = parallel_research("AI Safety", ["Alignment", "Robustness", "Interpretability"])
print(result["synthesis"])
Chaining with Dependencies:
def chain_workflow(initial_data: str) -> str:
"""Execute tasks sequentially with data flow."""
# Step 1: Analyze
response1 = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=500,
messages=[{
"role": "user",
"content": f"Analyze this data and identify patterns:\n{initial_data}"
}]
)
analysis = response1.content[0].text
# Step 2: Plan (uses analysis output)
response2 = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=500,
messages=[{
"role": "user",
"content": f"Based on this analysis:\n{analysis}\n\nCreate an action plan."
}]
)
plan = response2.content[0].text
# Step 3: Implement (uses plan output)
response3 = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1000,
messages=[{
"role": "user",
"content": f"Implement this plan with code:\n{plan}"
}]
)
implementation = response3.content[0].text
return implementation
result = chain_workflow("Sales data: 10% growth, high churn in Q2")
print(result)
Router Agent:
def router_agent(issue: str) -> str:
"""Route issue to appropriate handler."""
# Step 1: Classify
classifier = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=100,
system="Classify issue as: TECHNICAL, BUSINESS, URGENT, OTHER. Respond with only the category.",
messages=[{"role": "user", "content": issue}]
)
category = classifier.content[0].text.strip()
# Step 2: Route and handle
if category == "TECHNICAL":
handler_prompt = "You are a technical expert. Solve this technical issue:\n"
elif category == "BUSINESS":
handler_prompt = "You are a business analyst. Address this business concern:\n"
elif category == "URGENT":
handler_prompt = "This is urgent. Provide immediate action items:\n"
else:
handler_prompt = "Address this general inquiry:\n"
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[{"role": "user", "content": handler_prompt + issue}]
)
return response.content[0].text
result = router_agent("Our database is down and customers can't access their accounts!")
print(result)
MODULE 11: Conclusion & Course Summary
Key Takeaways
- API Fundamentals: Authentication, requests, multi-turn conversations, system prompts, streaming
- Prompt Engineering: Clarity, specificity, XML structure, examples, chain-of-thought
- Tool Integration: Schemas, message flow, multi-tool, tool results, fine-grained calling
- Retrieval & Search: Chunking, embeddings, BM25, hybrid search, RAG pipelines
- Advanced Features: Extended thinking, images/PDFs, citations, caching, code execution, Files API
- Architecture: MCP protocol, Claude Code setup, MCP servers, hooks
- Systems Design: Agents vs workflows, parallelization, chaining, routing patterns
Certification Prep Checklist
- Build basic request flow (model selection, tokens, streaming)
- Implement multi-turn conversation with state management
- Design and test a prompt with examples and XML structure
- Create tool schema and implement tool-use loop
- Build simple RAG pipeline with retrieval and synthesis
- Implement prompt caching and measure token savings
- Use extended thinking for reasoning task; measure accuracy improvement
- Analyze PDF or image using Files API or base64 encoding
- Design an agent with tool use and loop control
- Implement parallel task execution with results aggregation
- Chain multiple model calls with data flow
- Set up MCP server integration in Claude Code
- Configure hooks for automation (SessionStart, PreToolUse, PostToolUse)
- Measure evaluation metrics (accuracy, F1, BLEU) on test dataset
- Optimize for cost: cache, use Haiku where possible, batch calls
Resources
- Docs: https://docs.anthropic.com/
- SDK:
pip install anthropic(Python),npm install @anthropic-ai/sdk(Node.js) - API Key: https://console.anthropic.com/
- Models: claude-3-5-sonnet-20241022, claude-3-5-haiku-20241022, claude-3-opus-20250219
- MCP: https://modelcontextprotocol.io/
Claude Model Comparison Table
| Feature | Haiku 3.5 | Sonnet 3.5 | Opus 3.5 |
|---|---|---|---|
| Context Window | 200K tokens | 200K tokens | 200K tokens |
| Input Pricing | $0.80/1M | $3.00/1M | $15.00/1M |
| Output Pricing | $4.00/1M | $15.00/1M | $60.00/1M |
| Speed | Fastest | Balanced | Slow |
| Reasoning | Good | Very Good | Excellent |
| Coding | Good | Excellent | Excellent |
| Best For | Simple tasks, high volume, cost-sensitive | Balanced use, production APIs | Complex reasoning, multi-step tasks |
| Extended Thinking | Supported | Supported | Supported |
| Vision (Images) | Yes | Yes | Yes |
| Tool Use | Yes | Yes | Yes |
| Max Tokens | Recommend <2048 | Recommend <4096 | Recommend <8192 |
| Streaming | Yes | Yes | Yes |
| Cached Tokens | $0.08/1M (90% discount) | $0.30/1M (90% discount) | $1.50/1M (90% discount) |
When to Use Each Model
- Haiku: Classification, simple Q&A, high-throughput systems, RAG retrieval ranking
- Sonnet: Production APIs, chatbots, code generation, RAG synthesis, balanced latency/quality
- Opus: Research tasks, complex reasoning, math/physics, novel problem-solving, cost-insensitive
Implementation Checklist for Certification
Basic API Usage
- Create client with API key
- Make basic request with model, max_tokens, messages
- Handle response and extract text/stop_reason/usage
- Implement error handling (rate limits, timeouts, auth)
Multi-Turn & Advanced Parameters
- Build conversation history and multi-turn loop
- Set system prompt and understand scope
- Configure temperature and top_p
- Implement streaming with event loop
- Use structured output (JSON schema) and validate
Prompt Engineering
- Write clear, specific prompts with examples
- Use XML tags for complex instructions
- Implement few-shot learning with examples
- Add chain-of-thought prompts for reasoning
- Test variations on evaluation set
Tool Use
- Define tool schemas with input parameters
- Implement tool-use loop: send tools → handle response → execute → return result
- Handle multiple tools in single turn
- Use tool_choice parameter (auto, required, specific)
- Add error handling for tool failures
RAG
- Chunk documents (semantic or fixed-size)
- Build retrieval function (BM25 or vector-based)
- Integrate retrieval with Claude request
- Measure retrieval quality and relevance
- Implement hybrid search if needed
Advanced Features
- Use extended thinking for reasoning task; measure improvement
- Upload and analyze image (base64 or URL)
- Upload document via Files API
- Implement prompt caching; measure cache hit rate
- Add citations to responses where relevant
Agents & Workflows
- Implement parallel task execution
- Chain multiple model calls with data flow
- Create router agent with classification + routing
- Use tools in agentic loop with goal checking
- Implement error recovery and retries
Evaluation & Optimization
- Create test dataset (20+ examples)
- Implement code-based grading (exact match, regex, schema)
- Use model-based grading for subjective tasks
- Measure accuracy, F1, or custom metrics
- Track token usage and costs per task
- Identify failure patterns and iterate
Deep Dives & Advanced Topics
Prompt Caching Strategy
When to cache:
- System prompt (>1K tokens) used in all requests
- Few-shot examples (>1K tokens) stable across queries
- Large reference documents included in system
- Long conversation history (>5 turns)
Cost calculation:
Without cache:
5 requests × 5000 tokens = 25,000 input tokens
Cost: 25,000 × $3.00 / 1M = $0.075
With cache (1x creation, 4x reads):
Creation: 5000 tokens × $3.00 / 1M = $0.015
Reads: 4 × 5000 × $0.30 / 1M = $0.006
Total: $0.021 (72% savings)
Extended Thinking Budget
Recommended budgets by task:
- Simple reasoning: 2,000-5,000 tokens
- Medium complexity (coding, math): 5,000-10,000 tokens
- Complex multi-step: 10,000-20,000 tokens (max)
Token cost:
- Thinking tokens = output tokens (NOT discounted)
- If thinking=5000 + response=1000 = 6000 output tokens charged
Tool Use Best Practices
Schema design:
{
"name": "calculate_revenue",
"description": "Calculate total revenue for a given period and product category",
"input_schema": {
"type": "object",
"properties": {
"start_date": {"type": "string", "format": "date", "description": "ISO 8601 date"},
"end_date": {"type": "string", "format": "date"},
"category": {"type": "string", "enum": ["electronics", "clothing", "food"], "description": "Product category"}
},
"required": ["start_date", "end_date", "category"]
}
}
Error handling in tool results:
if error_occurred:
tool_result = {
"type": "tool_result",
"tool_use_id": use_id,
"content": f"Error: Database connection failed. Retrying...",
"is_error": True
}
else:
tool_result = {
"type": "tool_result",
"tool_use_id": use_id,
"content": json.dumps({"revenue": 12345, "units": 500})
}
RAG Optimization
Chunking strategies:
Semantic: Split on headers/paragraphs
- Pros: Preserves context, reduces redundancy
- Cons: Variable chunk sizes, harder to implement
Fixed sliding window: 512 tokens with 256-token overlap
- Pros: Consistent, predictable
- Cons: May split important concepts
Hierarchical: Section → subsection → paragraph
- Pros: Enables different retrieval granularities
- Cons: More complex indexing
Retrieval quality metrics:
- Precision@k: % of top-k results relevant to query
- Recall: % of all relevant docs retrieved
- MRR (Mean Reciprocal Rank): Average position of first relevant result
- NDCG (Normalized Discounted Cumulative Gain): Relevance ranking quality
Hybrid search weights:
- BM25 weight: 0.3-0.5 (keyword precision)
- Vector weight: 0.5-0.7 (semantic similarity)
- Metadata weight: 0.0-0.2 (date/category filters)
Agent Design Patterns
Loop Agent with Tool Use:
def loop_agent(goal: str, max_iterations: int = 10) -> str:
state = {"goal": goal, "steps": [], "current_info": ""}
for i in range(max_iterations):
# Perceive & Plan
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
tools=tools,
system=f"Goal: {goal}\nProgress so far: {state['current_info']}",
messages=[{"role": "user", "content": "What's the next step?"}]
)
# Check if done
if response.stop_reason == "end_turn":
return response.content[0].text
# Act (execute tool)
if response.stop_reason == "tool_use":
tool_use = next(b for b in response.content if b.type == "tool_use")
result = execute_tool(tool_use.name, tool_use.input)
state["steps"].append((tool_use.name, result))
state["current_info"] += f"\nStep {i+1}: {tool_use.name} → {result}"
return "Max iterations reached"
Unresolved Questions / Topics for Further Study
- How to handle very large documents (100K+ tokens) in RAG? (Hierarchical chunking strategies)
- Fine-tuning vs RAG: When to use fine-tuning for domain-specific tasks?
- Cost optimization for high-volume production: Batching, caching, model selection trade-offs?
- Guardrails and content filtering: Implementing safety layers on top of Claude API?
- Multi-language support: How well does Claude handle non-English prompts in tool use?
- Real-time streaming UI patterns: Best practices for streaming multiple concurrent requests?
- Agent memory persistence: Effective strategies for long-running agents with context limits?
- MCP server scaling: Production deployment patterns for MCP servers with multiple clients?