Building with the Claude API - Certification Study Guide

31 min read 6524 words

Table of Contents

Building with the Claude API - Certification Study Guide

Building with the Claude API - Certification Study Guide

Course: Anthropic - Building with the Claude API
Modules: 11 (Introduction through Agents & Workflows)
Target: Certification Prep

MODULE 1: Introduction - Claude Models Overview

Key Notes

Claude model family: Opus (most capable), Sonnet (balanced), Haiku (fastest/cheapest)
Model context windows: Haiku 200K, Sonnet 200K, Opus 200K tokens
Latest stable model: claude-3-5-sonnet-20241022 (or current latest version)
API version: Messages API is the standard (avoid legacy text completions)
Rate limits: Vary by plan (free, pro, enterprise); track via response headers anthropic-ratelimit-*
Pricing model: Input tokens cheaper than output tokens; cached tokens are 90% discount
Token counting: Use count_tokens endpoint before production calls
Supported formats: JSON, text, images (PNG, GIF, JPEG, WebP), PDFs, documents via Files API
Authentication: API key via ANTHROPIC_API_KEY env var or header
Base URL: https://api.anthropic.com/v1
Timeout defaults: 10s for SDK (configurable)

Best Practices

Choose model based on task complexity: Haiku for simple tasks, Sonnet for balanced, Opus for reasoning
Always set explicit max_tokens to avoid surprise token usage
Use system prompt for role/behavior, not in message history
Include version-specific features only when supported by chosen model
Cache large context (docs, few-shots) to reduce cost
Stream responses for better UX and token efficiency feedback
Test with cheaper models (Haiku) before scaling to Opus

Example (Python)

import anthropic
import os

client = anthropic.Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))

# Basic request
message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude!"}
    ]
)

print(f"Response: {message.content[0].text}")
print(f"Stop reason: {message.stop_reason}")
print(f"Usage: input={message.usage.input_tokens}, output={message.usage.output_tokens}")

MODULE 2: Accessing Claude with the API

Key Notes

API Authentication:

API key location: https://console.anthropic.com/
Header format: x-api-key: <key>
Or: ANTHROPIC_API_KEY=<key> environment variable
Keys are secret; rotate if exposed
Per-project keys available in org settings

Request Structure:

Method: POST /v1/messages
Content-Type: application/json
Required fields: model, max_tokens, messages
Optional: system, temperature, top_p, tools, tool_choice, etc.

Message Format:

messages array with role (user/assistant) and content (string or array of blocks)
role alternates: user → assistant → user
Each turn is stateless; send full conversation history for multi-turn

Multi-Turn Conversation:

Track message history client-side
System prompt applies to entire conversation
Assistant’s previous responses become assistant messages in next request
Tool results are inserted as user messages with tool_use_id reference

System Prompt:

Placed in separate system parameter, not messages array
Applies to all turns in conversation
Can be updated per request (creates new conversation context)
Best for role definition, rules, output format instructions
Costs same as message tokens but cached efficiently

Temperature & Top-P:

temperature (0.0-1.0): 0=deterministic, 1=random (default 1.0)
top_p (0.0-1.0): nucleus sampling, used with temperature
For structured output: temperature=0
For creative: temperature=0.8-1.0
Rarely combine with top_k (deprecated in favor of top_p)

Streaming:

Set stream=True in request
Returns server-sent events (SSE) with delta updates
Stream event types: content_block_start, content_block_delta, message_delta, message_stop
Rebuild message by accumulating deltas
Always consume stream fully before closing connection

Structured Output:

Use response_format parameter with type: "json_schema" (if model supports)
Define JSON schema in json_schema.schema property
Model will output valid JSON matching schema
Useful for function calls, data extraction, parsing

Best Practices

Always include explicit max_tokens (don’t rely on defaults)
Send full conversation history for multi-turn (Claude has no built-in memory)
Use system prompt for all conversations with consistent instructions
Set temperature=0 for deterministic tasks (classification, extraction)
Stream for interactive applications; collect full message for logging
Validate JSON schema compliance in client if structured output requested
Handle rate limiting with exponential backoff
Set appropriate timeouts for long-running requests

Example (Python)

Basic Request:

message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    system="You are a helpful assistant.",
    messages=[
        {"role": "user", "content": "What is 2+2?"}
    ]
)
print(message.content[0].text)

Multi-Turn Conversation:

system_prompt = "You are a helpful coding assistant."
conversation_history = []

def chat(user_input):
    conversation_history.append({"role": "user", "content": user_input})
    
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=2048,
        system=system_prompt,
        messages=conversation_history
    )
    
    assistant_message = response.content[0].text
    conversation_history.append({"role": "assistant", "content": assistant_message})
    return assistant_message

# Multiple turns
print(chat("What is Python?"))
print(chat("How do I write a class?"))
print(chat("Can you show me an example?"))

Streaming:

print("Streaming response:")
with client.messages.stream(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    system="You are concise.",
    messages=[{"role": "user", "content": "Write a haiku about AI."}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
print()

Structured Output (JSON):

response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Extract the name and age from: John is 30 years old."}],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "person_extractor",
            "schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "age": {"type": "integer"}
                },
                "required": ["name", "age"]
            },
            "strict": True
        }
    }
)

import json
data = json.loads(response.content[0].text)
print(f"Name: {data['name']}, Age: {data['age']}")

MODULE 3: Prompt Evaluation

Key Notes

Evaluation Workflow:

Define evaluation task (classification, generation, reasoning)
Create test dataset (labeled examples)
Run model on test cases
Grade outputs (automated or manual)
Compute metrics (accuracy, F1, similarity, custom)
Analyze failures and iterate

Test Dataset Design:

Minimum 20-50 examples for reliable signal
Include edge cases, ambiguous inputs, common errors
Label expected outputs for supervised evaluation
Stratify by category if multi-class
Version control datasets alongside prompts

Grading Methods:

Code-based grading:

Exact match (string equality)
Regex matching (pattern validation)
JSON schema validation
Custom Python function (flexible)
Numeric thresholds
Function-based scoring (0-1 range)

Model-based grading:

Use Claude to grade Claude’s outputs (consistent rubric)
Judges model: Opus 3.5 or Sonnet 3.5
Rubric: clear criteria, examples, scoring scale
Less code, captures semantic quality
Slower/more expensive than code-based but more reliable

Metrics to track:

Accuracy (% correct)
F1 score (precision × recall)
Token efficiency (tokens/task)
Latency (response time)
Cost (input + output tokens × pricing)
User satisfaction (if collecting feedback)

Best Practices

Start with code-based grading (fast iteration)
Use model-based grading for subjective tasks (quality, tone, correctness)
Separate test set from training/validation set
Run evals on multiple model versions before deployment
Log all eval runs with timestamps, model, prompt version
Aim for 95%+ accuracy before production
Document failure cases and plan improvements
A/B test prompt changes on held-out test set

Example (Python)

import anthropic
import json
from typing import Literal

client = anthropic.Anthropic()

# Test dataset
test_cases = [
    {"input": "Extract the color: The car is red.", "expected_output": "red"},
    {"input": "Extract the color: She wore a blue dress.", "expected_output": "blue"},
    {"input": "Extract the color: The sky is clear.", "expected_output": "no color mentioned"},
]

# Code-based grading (exact match)
def grade_exact_match(output: str, expected: str) -> bool:
    return output.strip().lower() == expected.strip().lower()

# Model-based grading
def grade_with_claude(input_text: str, output: str, expected: str) -> float:
    """Use Claude as a judge: returns score 0-1."""
    rubric = f"""
    Task: Evaluate if the output correctly answers the input query.
    Expected answer: {expected}
    Actual output: {output}
    
    Score 1.0 if correct, 0.5 if partially correct, 0.0 if incorrect.
    Respond with ONLY the number (e.g., 1.0).
    """
    
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=10,
        system="You are an expert evaluator. Score the response accurately.",
        messages=[{"role": "user", "content": rubric}]
    )
    
    try:
        return float(response.content[0].text.strip())
    except:
        return 0.0

# Run evaluation
def evaluate_prompt():
    results = []
    
    for test in test_cases:
        # Get model output
        response = client.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=100,
            system="Extract the color mentioned. If no color, respond 'no color mentioned'.",
            messages=[{"role": "user", "content": test["input"]}]
        )
        
        output = response.content[0].text
        
        # Grade with both methods
        exact_match = grade_exact_match(output, test["expected_output"])
        model_score = grade_with_claude(test["input"], output, test["expected_output"])
        
        results.append({
            "input": test["input"],
            "expected": test["expected_output"],
            "actual": output,
            "exact_match": exact_match,
            "model_score": model_score
        })
    
    # Compute metrics
    accuracy = sum(1 for r in results if r["exact_match"]) / len(results)
    avg_model_score = sum(r["model_score"] for r in results) / len(results)
    
    print(f"Accuracy: {accuracy:.2%}")
    print(f"Average Model Score: {avg_model_score:.2f}")
    print("\nDetailed Results:")
    for r in results:
        print(f"  Input: {r['input']}")
        print(f"  Expected: {r['expected']}, Actual: {r['actual']}")
        print(f"  Match: {r['exact_match']}, Model Score: {r['model_score']}\n")

evaluate_prompt()

MODULE 4: Prompt Engineering Techniques

Key Notes

Core Principles:

Clarity: Be specific about task, not vague
- Bad: “Summarize this”
- Good: “Summarize in 3 bullet points, focusing on methodology”
Specificity: Include constraints, format, examples
- Output format (JSON, bullet points, code blocks)
- Length (words, paragraphs, tokens)
- Tone (formal, casual, technical)
- Edge cases (“If N/A, respond ’not provided’”)
XML Tags: Structure complex prompts
- <task>, <context>, <rules>, <output_format>
- Makes parsing easier, prevents confusion
- Claude particularly responsive to well-structured XML
Examples (Few-Shot): Dramatically improve performance
- 2-5 examples usually sufficient
- Show input-output pairs for task
- Include edge cases in examples
- More effective than long descriptions
Chain of Thought: Encourage step-by-step reasoning
- “Think step-by-step before answering”
- Improves accuracy on reasoning tasks
- Increases token usage but better results
Iterative Refinement: Test, measure, improve
- Evaluate on test set
- Identify failure patterns
- Adjust prompt, re-evaluate
- Version prompts alongside evals

Best Practices

Separate instructions (system) from data (user messages)
Use XML tags for multi-part instructions
Include 2-5 diverse examples for complex tasks
Ask for step-by-step reasoning on logic/math tasks
Specify output format explicitly
For creative tasks, use higher temperature; for accuracy, use temperature=0
Test variations on same test set to measure impact
Document what changed and why in prompt versions

Example (Python)

Clarity & Specificity:

# Bad prompt
response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=200,
    messages=[{"role": "user", "content": "Analyze this text."}]
)

# Good prompt with clarity and specificity
response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=200,
    messages=[{
        "role": "user",
        "content": """Analyze the following customer review for sentiment.
        
Output format: JSON with fields: sentiment (positive/negative/neutral), confidence (0-1), key_phrases (list of strings).

Text: "The product arrived late but works great once I set it up. Would recommend despite shipping issues."
"""
    }]
)

XML Structure:

prompt = """
<task>
Extract structured information from a product review.
</task>

<rules>
- Output MUST be valid JSON
- If information is missing, use null
- Sentiment must be one of: positive, negative, neutral
- Confidence is a number 0-1
</rules>

<output_format>
{
  "product_name": string,
  "sentiment": string,
  "confidence": number,
  "pros": [string],
  "cons": [string],
  "rating": number or null
}
</output_format>

<examples>
Input: "Excellent phone! Fast processor, great camera. Battery life is average though."
Output: {
  "product_name": null,
  "sentiment": "positive",
  "confidence": 0.9,
  "pros": ["fast processor", "great camera"],
  "cons": ["average battery life"],
  "rating": null
}
</examples>

<input>
"These shoes are uncomfortable and overpriced. Not worth the hype."
</input>
"""

response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=500,
    messages=[{"role": "user", "content": prompt}]
)

Few-Shot Examples:

examples = [
    {
        "input": "The meeting is at 3 PM tomorrow.",
        "output": "DATETIME: 3:00 PM tomorrow"
    },
    {
        "input": "I have 5 apples.",
        "output": "QUANTITY: 5"
    },
    {
        "input": "The sky is blue.",
        "output": "ATTRIBUTE: sky, blue"
    }
]

prompt = "Extract the main entity from this sentence:\n\nThe project deadline is next Friday.\n\nUse examples:\n"
for ex in examples:
    prompt += f"Input: {ex['input']}\nOutput: {ex['output']}\n\n"
prompt += "Now extract: 'The project deadline is next Friday.'"

response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=100,
    messages=[{"role": "user", "content": prompt}]
)

Chain of Thought:

prompt = """Solve this math problem step-by-step.

Problem: If a train travels at 60 mph for 2.5 hours, how far does it go?

Before answering, think through:
1. What formula do I need?
2. What values do I have?
3. What's the calculation?
4. What's the final answer?
"""

response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=300,
    temperature=0,  # Deterministic for math
    messages=[{"role": "user", "content": prompt}]
)

MODULE 5: Tool Use with Claude

Key Notes

Tool Use Overview:

Claude can request function calls; you execute and return results
Supports up to 10,000 tools per request
Tool definitions are JSON schemas describing function signature
Tool results fed back as user messages with special role

Tool Schema Structure:

{
  "name": "function_name",
  "description": "What this function does",
  "input_schema": {
    "type": "object",
    "properties": {
      "param1": {"type": "string", "description": "..."},
      "param2": {"type": "number", "description": "..."}
    },
    "required": ["param1"]
  }
}

Message Flow:

Send tool definitions in tools array
Model responds with stop_reason: "tool_use"
Extract tool_use blocks from response
Execute tool, collect result
Send result back as {"role": "user", "content": [{"type": "tool_result", "tool_use_id": "...", "content": "..."}]}
Model continues with updated information

Tool Use Block Structure:

{
  "type": "tool_use",
  "id": "unique_id",
  "name": "tool_name",
  "input": {...}
}

Tool Result Block:

{
  "type": "tool_result",
  "tool_use_id": "id_from_tool_use_block",
  "content": "result string or error"
}

Multiple Tools:

Can use multiple tools in single turn
Model decides which tools to call
All tools called in parallel, results returned together
Specify tool order in tools array to hint preference

Tool Choice Parameter:

"auto" (default): Model decides when to use tools
"required": Model must use a tool in response
{"type": "tool", "name": "specific_tool"}: Force specific tool
"none": Model won’t use any tools

Fine-Grained Tool Calling:

Set tool_choice={"type": "tool", "name": "exact_tool_name"} to force specific tool
Use "required" when tool use is essential for task
Useful for forcing function calling in agentic workflows

Text Edit Tool:

Built-in tool for editing text/code
Useful in agentic scenarios where Claude modifies documents
Not directly exposed; mention if needed for advanced workflows

Web Search Tool:

Built-in capability; can search web within tool use
Returns snippet results with citations
Used within tool_use blocks similar to custom tools

Best Practices

Keep tool descriptions concise but clear
Use descriptive parameter names and descriptions
Set required fields only for essential parameters
Include default values for optional parameters
Error handling: return error message as tool result, let Claude retry
For multi-step workflows, use tools to gather info, then summarize
Cache tool definitions (especially for long lists) using prompt caching
Test tool schemas with tools parameter before deployment
Use tool_choice="required" to enforce function calling for APIs

Example (Python)

Define Tool Schema & Basic Flow:

import anthropic
import json

client = anthropic.Anthropic()

# Define tools
tools = [
    {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "input_schema": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "City or coordinates"},
                "units": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "Temperature units"}
            },
            "required": ["location"]
        }
    },
    {
        "name": "get_time",
        "description": "Get current time for a timezone",
        "input_schema": {
            "type": "object",
            "properties": {
                "timezone": {"type": "string", "description": "Timezone (e.g., 'America/New_York')"}
            },
            "required": ["timezone"]
        }
    }
]

# Simulate tool execution
def execute_tool(name: str, input_data: dict) -> str:
    if name == "get_weather":
        return json.dumps({
            "location": input_data["location"],
            "temperature": 22,
            "condition": "Sunny"
        })
    elif name == "get_time":
        return json.dumps({
            "timezone": input_data["timezone"],
            "time": "14:30:00"
        })
    return "Tool not found"

# Multi-turn tool use loop
def chat_with_tools(user_message: str):
    messages = [{"role": "user", "content": user_message}]
    
    print(f"User: {user_message}\n")
    
    while True:
        # Send request with tools
        response = client.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=1024,
            tools=tools,
            messages=messages
        )
        
        # Check if tool use is needed
        if response.stop_reason == "tool_use":
            # Extract all tool use blocks
            tool_uses = [block for block in response.content if block.type == "tool_use"]
            
            # Add assistant response to messages
            messages.append({"role": "assistant", "content": response.content})
            
            # Execute tools and collect results
            tool_results = []
            for tool_use in tool_uses:
                print(f"Tool: {tool_use.name}")
                print(f"Input: {json.dumps(tool_use.input, indent=2)}")
                
                result = execute_tool(tool_use.name, tool_use.input)
                print(f"Result: {result}\n")
                
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": tool_use.id,
                    "content": result
                })
            
            # Send tool results back
            messages.append({"role": "user", "content": tool_results})
        
        elif response.stop_reason == "end_turn":
            # Model finished, no more tools
            final_response = next(
                (block.text for block in response.content if hasattr(block, "text")),
                "No response"
            )
            print(f"Assistant: {final_response}")
            break
        
        else:
            print(f"Unexpected stop reason: {response.stop_reason}")
            break

# Test
chat_with_tools("What's the weather in Paris and the time in London?")

Force Tool Use & Handle Multiple Tools:

# Force tool use
response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    tools=tools,
    tool_choice="required",  # Model must use a tool
    messages=[{"role": "user", "content": "Tell me about Paris."}]
)

# Or force specific tool
response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    tools=tools,
    tool_choice={"type": "tool", "name": "get_weather"},  # Force this tool
    messages=[{"role": "user", "content": "What's the weather?"}]
)

# Parallel tool execution (both tools called in same turn)
def handle_parallel_tools(user_message: str):
    messages = [{"role": "user", "content": user_message}]
    
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1024,
        tools=tools,
        messages=messages
    )
    
    if response.stop_reason == "tool_use":
        tool_uses = [block for block in response.content if block.type == "tool_use"]
        
        # Execute all tools in parallel
        results = []
        for tool_use in tool_uses:
            result = execute_tool(tool_use.name, tool_use.input)
            results.append({
                "type": "tool_result",
                "tool_use_id": tool_use.id,
                "content": result
            })
        
        # Send all results back at once
        messages.append({"role": "assistant", "content": response.content})
        messages.append({"role": "user", "content": results})
        
        # Continue conversation
        final_response = client.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=1024,
            tools=tools,
            messages=messages
        )
        return final_response

handle_parallel_tools("Get weather in Paris AND time in London")

MODULE 6: RAG and Agentic Search

Key Notes

RAG (Retrieval-Augmented Generation):

Retrieves relevant documents, passes to LLM for synthesis
Better than fine-tuning for up-to-date, dynamic content
Solves hallucination by grounding responses in real data

RAG Pipeline Steps:

Chunking: Split documents into small pieces
- Chunk size: 300-1000 tokens typical
- Overlap: 100-200 tokens to preserve context
- Split on semantic boundaries (paragraphs, sections)
Embeddings: Convert chunks to vectors
- Use embedding model (e.g., OpenAI, Anthropic, Cohere)
- Dimension: 768-3072 typical
- Store vectors in vector DB (Pinecone, Weaviate, Milvus)
Indexing: Build search index
- Vector index for semantic search
- BM25 index for keyword search
- Hybrid search combines both
Retrieval: Find relevant chunks
- Query embedding vs document embeddings (cosine similarity)
- Top-K results (usually 3-10)
- Filtering by metadata (date, category)
Generation: Pass retrieved context to LLM
- Include original query + retrieved chunks
- Use system prompt to define task
- Claude synthesizes answer with citations

Chunking Strategy:

Fixed size: Simple, consistent (e.g., 512 tokens)
Semantic: Split on headers, paragraphs (preserves meaning)
Overlapping: Maintain context across chunks
Hierarchical: Chunks with parent/child relationships

BM25 Search:

Keyword-based ranking algorithm
Good for exact matches, specific terms
Complement vector search for hybrid retrieval
Fast, no embeddings needed

Multi-Index Search:

Vector index: semantic similarity
BM25 index: keyword matching
Metadata index: filtering (date, source, category)
Combine results with reciprocal rank fusion or learned weights

Vector DB Selection:

Pinecone: Managed, serverless, easy to use
Weaviate: Open-source, flexible, local/cloud
Milvus: Open-source, high performance
Qdrant: Rust-based, performant, similar to Milvus

Best Practices

Chunk at semantic boundaries (paragraphs, sections), not randomly
Use 2-3 sources of retrieval (vector + BM25 + metadata)
Retrieve 5-10 top results; let model use most relevant
Include source/citation metadata with chunks
Test retrieval quality independently (check if relevant docs retrieved)
Combine with reranking (use LLM to rerank retrieved results)
Cache retrieved context if same query appears multiple times
Monitor retrieval performance: measure precision@k, recall, MRR

Example (Python)

Basic RAG Flow:

import anthropic
from typing import List

client = anthropic.Anthropic()

# Simulated document store (in production: vector DB)
documents = [
    {
        "id": "doc1",
        "text": "Python is a high-level programming language. It emphasizes readability.",
        "source": "Python Basics"
    },
    {
        "id": "doc2",
        "text": "JavaScript runs in browsers and enables interactive web pages.",
        "source": "Web Development"
    },
    {
        "id": "doc3",
        "text": "Python has a rich ecosystem of libraries like NumPy, Pandas, TensorFlow.",
        "source": "Python Libraries"
    }
]

def retrieve_documents(query: str, top_k: int = 3) -> List[str]:
    """Simple keyword-based retrieval (BM25-like)."""
    query_terms = query.lower().split()
    scored_docs = []
    
    for doc in documents:
        score = sum(1 for term in query_terms if term in doc["text"].lower())
        if score > 0:
            scored_docs.append((doc, score))
    
    # Sort by score and return top_k
    ranked = sorted(scored_docs, key=lambda x: x[1], reverse=True)
    return [doc["text"] for doc, _ in ranked[:top_k]]

def rag_query(user_query: str) -> str:
    """RAG pipeline: retrieve → pass to Claude → synthesize."""
    
    # Step 1: Retrieve relevant documents
    retrieved_docs = retrieve_documents(user_query, top_k=3)
    context = "\n\n".join([f"[Chunk {i+1}]\n{doc}" for i, doc in enumerate(retrieved_docs)])
    
    # Step 2: Build prompt with context
    system_prompt = """You are a helpful assistant. Answer based on the provided context.
    If context doesn't contain relevant information, say so clearly."""
    
    user_message = f"""Context:
{context}

Question: {user_query}

Answer based on the context above."""
    
    # Step 3: Get response from Claude
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1024,
        system=system_prompt,
        messages=[{"role": "user", "content": user_message}]
    )
    
    return response.content[0].text

# Test RAG
print(rag_query("What can I do with Python?"))
print("\n---\n")
print(rag_query("How does JavaScript work?"))

Multi-Index Hybrid Search:

from collections import Counter
import math

def bm25_score(doc_text: str, query_terms: List[str]) -> float:
    """Simple BM25-like scoring."""
    doc_terms = doc_text.lower().split()
    score = 0
    for term in query_terms:
        count = doc_terms.count(term)
        score += math.log(1 + count)
    return score

def retrieve_hybrid(query: str, top_k: int = 5):
    """Hybrid retrieval: BM25 + semantic (simulated vector)."""
    query_terms = query.lower().split()
    
    results = []
    for doc in documents:
        # BM25 score
        bm25 = bm25_score(doc["text"], query_terms)
        
        # Simulated vector similarity (0-1)
        # In production: actual embedding cosine similarity
        vector_sim = 0.8 if any(term in doc["text"].lower() for term in query_terms) else 0.2
        
        # Combine scores (weighted average)
        combined = 0.4 * bm25 / 10 + 0.6 * vector_sim  # Normalize BM25
        results.append((doc, combined))
    
    # Rank and return
    ranked = sorted(results, key=lambda x: x[1], reverse=True)
    return [doc["text"] for doc, _ in ranked[:top_k]]

docs = retrieve_hybrid("Python programming language", top_k=2)
print("Retrieved documents (hybrid):")
for i, doc in enumerate(docs, 1):
    print(f"{i}. {doc[:80]}...")

MODULE 7: Features of Claude

Key Notes

Extended Thinking:

Enables Claude to reason in “thinking” tokens (not shown to user)
Improves accuracy on complex reasoning, math, coding
Costs: thinking tokens = output tokens (not discounted)
Parameter: thinking with type="enabled" or type="disabled"
budget_tokens: max thinking tokens (default 10,000)
Response contains thinking block (shown to client) + text block

Image/PDF Support:

Support formats: PNG, GIF, JPEG, WebP, PDF
Images sent in content array as {"type": "image", "source": {...}}
Image source: base64, url, or media_type
PDFs: use Files API or base64 encode (max 20MB per file, 5 files per message)
Vision capability included in all models

Citations:

Claude can cite document snippets with precise locations
Requires extracting citation data from response
Citation format: document indices + character ranges
Use bblock_citations in response headers if enabled
Useful for Q&A, document analysis, compliance

Prompt Caching:

Cache frequently-used context (system prompt, documents, examples)
Cached tokens charged 90% less than new tokens (10% of input token cost)
Cache hits: reuse cached tokens without reprocessing
Cache key: hash of request up to cache control point
Minimum cache size: 1024 tokens to create cache

Cache Control Placement:

System prompt (usually cached)
    ↓
Optional long context (docs, examples, few-shot)
    ↓ [CACHE_CONTROL HERE]
    ↓
User query (not cached)

Code Execution:

Claude can write and reason about code
Not directly executable in API; client must run code
Use tool use to return code for execution
Results fed back for Claude to analyze

Files API:

Upload documents for analysis
Supports: PDF, DOCX, XLSX, PPTX, TXT, CSV, JSON
File size: max 20MB
Solves: avoid base64 encoding, handle large docs efficiently
Reference in message: {"type": "document", "source": {"type": "file", "file_id": "..."}}

Best Practices

Use extended thinking for complex reasoning; measure cost vs accuracy improvement
Cache system prompts + few-shot examples for consistent savings
For images: include relevant metadata (describe what to look for)
Enable citations only if compliance/audit needed (adds overhead)
Use Files API for documents > 1MB or > 10k tokens
Test cache hit rate; measure savings before production
Combine caching + streaming for optimal token efficiency
PDF handling: extract text if possible, use Files API as fallback

Example (Python)

Extended Thinking:

response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=8000,
    thinking={
        "type": "enabled",
        "budget_tokens": 5000  # Max thinking tokens
    },
    messages=[{
        "role": "user",
        "content": "Solve: If a train leaves NYC at 9 AM going 60 mph, and another leaves Boston at 10 AM going 50 mph, when do they meet?"
    }]
)

# Extract thinking and response
for block in response.content:
    if block.type == "thinking":
        print(f"Thinking: {block.thinking}")
    elif block.type == "text":
        print(f"Answer: {block.text}")

Image Analysis:

import base64

# Load image as base64
with open("chart.png", "rb") as f:
    image_data = base64.standard_b64encode(f.read()).decode("utf-8")

response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "image",
                "source": {
                    "type": "base64",
                    "media_type": "image/png",
                    "data": image_data
                }
            },
            {
                "type": "text",
                "text": "What trends do you see in this chart?"
            }
        ]
    }]
)

print(response.content[0].text)

Prompt Caching:

system_prompt = """You are an expert analyst. Answer questions about the provided documents accurately."""

# Long context to cache (e.g., a large document)
cached_document = """
[Large document with thousands of tokens...]
Company History: Founded in 1990, specialized in cloud infrastructure.
Product Features: Load balancing, auto-scaling, monitoring, security.
Pricing: $99/month basic, $299/month pro, custom enterprise.
[... continues for many tokens ...]
"""

# Request with caching
response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": system_prompt
        },
        {
            "type": "text",
            "text": cached_document,
            "cache_control": {"type": "ephemeral"}  # Cache this content
        }
    ],
    messages=[{
        "role": "user",
        "content": "What is the pricing for the basic plan?"
    }]
)

# Check cache usage
print(f"Input tokens (new): {response.usage.input_tokens}")
print(f"Cache creation tokens: {response.usage.cache_creation_input_tokens}")
print(f"Cache read tokens: {response.usage.cache_read_input_tokens}")

# Second request reuses cache
response2 = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": system_prompt
        },
        {
            "type": "text",
            "text": cached_document,
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[{
        "role": "user",
        "content": "What are the main product features?"
    }]
)

print(f"\nSecond request:")
print(f"Cache read tokens (reused): {response2.usage.cache_read_input_tokens}")
# Expected: much lower cost due to cache hit

Files API:

# Upload a document
import os

with open("report.pdf", "rb") as f:
    file_response = client.beta.files.upload(
        file=(os.path.basename("report.pdf"), f, "application/pdf"),
        betas=["files-api-2025-04-14"]
    )

file_id = file_response.id
print(f"Uploaded file: {file_id}")

# Use file in message
response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=2048,
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "document",
                "source": {
                    "type": "file",
                    "file_id": file_id
                },
                "title": "Q1 Report"
            },
            {
                "type": "text",
                "text": "Summarize the key financial metrics."
            }
        ]
    }],
    betas=["files-api-2025-04-14"]
)

print(response.content[0].text)

# Cleanup
client.beta.files.delete(file_id, betas=["files-api-2025-04-14"])

MODULE 8: Model Context Protocol (MCP)

Key Notes

MCP Overview:

Protocol for LLMs to interact with external tools, data, APIs
Client (Claude) requests resources/tools from MCP server
Server provides tools, resources (files, databases), and prompts
Bidirectional communication over stdio, SSE, or HTTP

MCP Architecture:

Claude Client (LLM)
    ↓
MCP Client (proxy)
    ↓ [MCP Protocol]
    ↓
MCP Server (e.g., database, API)

Key Components:

Tools: Functions MCP server exposes
- Schema: name, description, input parameters
- Claude calls tools, server executes, returns result
Resources: Data/files MCP server provides
- Can be read, written, updated
- Examples: database records, files, API responses
- Use URI scheme (e.g., file://, db://)
Prompts: Contextual instructions from server
- Customize Claude’s behavior per server
- Include guidelines, examples, constraints

Server Inspector:

Tool to discover MCP server capabilities
List all tools, resources, prompts available
Test tool execution
Useful for debugging and documentation

MCP Protocol Details:

Request/response pattern (RPC-like)
Transport: stdio (local), SSE (HTTP), or WebSocket
Initialization: client sends list of MCP versions, server responds with capabilities
Error handling: structured error responses with codes

Best Practices

Version MCP servers; indicate breaking changes
Keep tool/resource schemas concise and clear
Use descriptive names for tools and resources
Implement proper error handling and logging
Test server with server inspector before deployment
Document resource URIs and tool parameters thoroughly
Cache MCP server responses if repeated calls expected

Example (Python)

Simple MCP Server Definition (conceptual):

# MCP servers are typically implemented in languages with good stdlib support
# Here's a conceptual example of what an MCP server provides

mcp_server_tools = [
    {
        "name": "get_user_by_id",
        "description": "Retrieve user information by ID",
        "input_schema": {
            "type": "object",
            "properties": {
                "user_id": {"type": "integer", "description": "User ID"}
            },
            "required": ["user_id"]
        }
    },
    {
        "name": "create_task",
        "description": "Create a new task for a user",
        "input_schema": {
            "type": "object",
            "properties": {
                "user_id": {"type": "integer"},
                "title": {"type": "string"},
                "description": {"type": "string"},
                "due_date": {"type": "string", "format": "date"}
            },
            "required": ["user_id", "title"]
        }
    }
]

mcp_resources = [
    {
        "uri": "database://users",
        "name": "Users Table",
        "description": "All user records",
        "mimeType": "application/json"
    },
    {
        "uri": "database://tasks",
        "name": "Tasks Table",
        "description": "All tasks",
        "mimeType": "application/json"
    }
]

# Claude would interact with this MCP server
# to call tools and access resources

Using MCP in Claude API Calls (with MCP client):

# In production, configure MCP servers in Claude Code settings
# The MCP client proxy handles the protocol

# Within Claude conversation, tools from MCP server become available
response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    tools=[
        # MCP tools are injected here by Claude Code harness
        # Example MCP tool from server:
        {
            "name": "get_user_by_id",
            "description": "Retrieve user information by ID",
            "input_schema": {
                "type": "object",
                "properties": {
                    "user_id": {"type": "integer"}
                },
                "required": ["user_id"]
            }
        }
    ],
    messages=[{
        "role": "user",
        "content": "Tell me about user 123."
    }]
)

# Claude calls tool, server executes, result returned

MODULE 9: Anthropic Apps (Claude Code)

Key Notes

Claude Code Overview:

VS Code extension enabling AI-assisted development
Integrates Claude API with code editor
Features: code generation, completion, refactoring, debugging, testing
Connected to MCP servers for external tool access

Setup:

Install Claude Code extension from VS Code marketplace
Authenticate with Anthropic API key
Configure in settings.json (project-specific) and settings.local.json (machine-specific)
Hooks auto-load from .claude/hooks/ for automation

MCP Server Integration:

Configure MCP servers in .claude/settings.json
Each server: name, path/URL, environment variables
Claude Code automatically instantiates and connects to servers
Tools/resources from MCP servers available in chat

Hooks (Automation):

SessionStart: Run before each session starts
PreToolUse: Run before Claude uses a tool
PostToolUse: Run after tool execution
UserPromptSubmit: Run before user input processed
Enable loop detection, build monitoring, token efficiency tracking

Rules & Docs:

.claude/rules/ — auto-loaded guidelines for every session
.claude/docs/ — reference material (architecture, standards, patterns)
.claude/skills/ — domain-specific capabilities (research, planning, code review)

Best Practices

Keep hooks lightweight (no heavy computation)
Use rules for enforcing standards, not for task-specific instructions
Organize MCP servers by domain (database, API, file system)
Version control .claude/ configuration across team
Use hooks for CI/CD integration, token monitoring, privacy enforcement
Document custom hooks and MCP servers in .claude/docs/

Example (Settings)

.claude/settings.json (sample):

{
  "mcp_servers": {
    "filesystem": {
      "command": "node",
      "args": ["./mcp-servers/filesystem.js"],
      "env": {
        "ALLOWED_PATHS": "/project/src,/project/docs"
      }
    },
    "database": {
      "command": "python",
      "args": ["./mcp-servers/database.py"],
      "env": {
        "DB_URL": "postgresql://localhost:5432/mydb"
      }
    }
  },
  "hooks": {
    "SessionStart": "node .claude/hooks/token-monitor.cjs",
    "PreToolUse": "node .claude/hooks/tool-validator.cjs",
    "PostToolUse": "node .claude/hooks/result-logger.cjs"
  }
}

MODULE 10: Agents and Workflows

Key Notes

Agents vs Workflows:

Agents:

Autonomous, goal-driven systems
Use tools to achieve objectives
Self-directed task planning and execution
Error recovery and retry logic
Examples: research agent, coding assistant, data analyst

Workflows:

Orchestrated sequences of steps
Fixed flow, deterministic routing
Human-in-the-loop decision points
Examples: approval pipelines, data pipelines, CI/CD chains

Parallelization:

Execute independent tasks simultaneously
No dependencies between tasks
Speed up overall execution
Example: retrieve 3 data sources in parallel, then synthesize

Chaining:

Sequential task execution with dependencies
Output of task N feeds into task N+1
Used for multi-step workflows
Example: research → design → implement → test

Routing:

Branch logic based on conditions
Route to different tasks based on input/output
Used for decision-making agents
Example: IF complex_issue THEN escalate ELSE resolve

Agentic Patterns:

Loop Agent:
- Perceive → Plan → Act → Repeat
- Check goal achievement, loop until done
- Tool use at each iteration
Router Agent:
- Classify input
- Route to specialized agent/tool
- Collect and synthesize results
Delegator:
- Break task into subtasks
- Delegate to sub-agents
- Aggregate results

Implementation Approaches:

Scheduled Agents (with schedule tool):
- Run on cron schedule or at specific time
- Good for monitoring, cleanup, reports
- Managed by harness
Task Agents (ad-hoc):
- Spawn when triggered by user/event
- Run to completion then exit
- Used for one-off work
Loop Agents (persistent):
- Long-running, check conditions periodically
- Monitor/polling patterns
- Use Monitor tool for streaming events

Best Practices

Parallelization: Identify independent tasks; spawn simultaneously; collect results
Chaining: Use task outputs as inputs to next; handle failures gracefully
Routing: Define clear decision criteria; ensure all routes have handlers
Agent Communication: Use file-based or API-based messaging between agents
Error Handling: Implement retry logic, fallback options, error logging
Monitoring: Log agent execution, measure latency, success rates
Testing: Test agent in isolation, then in composition
Scaling: Use queue systems (Bull, RabbitMQ) for high-volume agent execution

Example (Python & Pseudocode)

Parallel Execution:

import anthropic
import concurrent.futures

client = anthropic.Anthropic()

def research_topic(topic: str) -> str:
    """Research subtask: get info on topic."""
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=500,
        messages=[{
            "role": "user",
            "content": f"Research and summarize: {topic}"
        }]
    )
    return response.content[0].text

def parallel_research(main_topic: str, subtopics: list) -> dict:
    """Execute research in parallel, synthesize results."""
    
    # Parallelize
    with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
        futures = {
            executor.submit(research_topic, subtopic): subtopic
            for subtopic in subtopics
        }
        
        results = {}
        for future in concurrent.futures.as_completed(futures):
            subtopic = futures[future]
            results[subtopic] = future.result()
    
    # Synthesize
    synthesis_prompt = f"""
Given these research summaries on {main_topic}:
{chr(10).join(f'{topic}: {result}' for topic, result in results.items())}

Create a comprehensive summary combining all insights.
"""
    
    synthesis = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1024,
        messages=[{"role": "user", "content": synthesis_prompt}]
    )
    
    return {
        "subtopic_results": results,
        "synthesis": synthesis.content[0].text
    }

# Execute
result = parallel_research("AI Safety", ["Alignment", "Robustness", "Interpretability"])
print(result["synthesis"])

Chaining with Dependencies:

def chain_workflow(initial_data: str) -> str:
    """Execute tasks sequentially with data flow."""
    
    # Step 1: Analyze
    response1 = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=500,
        messages=[{
            "role": "user",
            "content": f"Analyze this data and identify patterns:\n{initial_data}"
        }]
    )
    analysis = response1.content[0].text
    
    # Step 2: Plan (uses analysis output)
    response2 = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=500,
        messages=[{
            "role": "user",
            "content": f"Based on this analysis:\n{analysis}\n\nCreate an action plan."
        }]
    )
    plan = response2.content[0].text
    
    # Step 3: Implement (uses plan output)
    response3 = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1000,
        messages=[{
            "role": "user",
            "content": f"Implement this plan with code:\n{plan}"
        }]
    )
    implementation = response3.content[0].text
    
    return implementation

result = chain_workflow("Sales data: 10% growth, high churn in Q2")
print(result)

Router Agent:

def router_agent(issue: str) -> str:
    """Route issue to appropriate handler."""
    
    # Step 1: Classify
    classifier = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=100,
        system="Classify issue as: TECHNICAL, BUSINESS, URGENT, OTHER. Respond with only the category.",
        messages=[{"role": "user", "content": issue}]
    )
    category = classifier.content[0].text.strip()
    
    # Step 2: Route and handle
    if category == "TECHNICAL":
        handler_prompt = "You are a technical expert. Solve this technical issue:\n"
    elif category == "BUSINESS":
        handler_prompt = "You are a business analyst. Address this business concern:\n"
    elif category == "URGENT":
        handler_prompt = "This is urgent. Provide immediate action items:\n"
    else:
        handler_prompt = "Address this general inquiry:\n"
    
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1024,
        messages=[{"role": "user", "content": handler_prompt + issue}]
    )
    
    return response.content[0].text

result = router_agent("Our database is down and customers can't access their accounts!")
print(result)

MODULE 11: Conclusion & Course Summary

Key Takeaways

API Fundamentals: Authentication, requests, multi-turn conversations, system prompts, streaming
Prompt Engineering: Clarity, specificity, XML structure, examples, chain-of-thought
Tool Integration: Schemas, message flow, multi-tool, tool results, fine-grained calling
Retrieval & Search: Chunking, embeddings, BM25, hybrid search, RAG pipelines
Advanced Features: Extended thinking, images/PDFs, citations, caching, code execution, Files API
Architecture: MCP protocol, Claude Code setup, MCP servers, hooks
Systems Design: Agents vs workflows, parallelization, chaining, routing patterns

Certification Prep Checklist

Build basic request flow (model selection, tokens, streaming)
Implement multi-turn conversation with state management
Design and test a prompt with examples and XML structure
Create tool schema and implement tool-use loop
Build simple RAG pipeline with retrieval and synthesis
Implement prompt caching and measure token savings
Use extended thinking for reasoning task; measure accuracy improvement
Analyze PDF or image using Files API or base64 encoding
Design an agent with tool use and loop control
Implement parallel task execution with results aggregation
Chain multiple model calls with data flow
Set up MCP server integration in Claude Code
Configure hooks for automation (SessionStart, PreToolUse, PostToolUse)
Measure evaluation metrics (accuracy, F1, BLEU) on test dataset
Optimize for cost: cache, use Haiku where possible, batch calls

Resources

Docs: https://docs.anthropic.com/
SDK: pip install anthropic (Python), npm install @anthropic-ai/sdk (Node.js)
API Key: https://console.anthropic.com/
Models: claude-3-5-sonnet-20241022, claude-3-5-haiku-20241022, claude-3-opus-20250219
MCP: https://modelcontextprotocol.io/

Claude Model Comparison Table

Feature	Haiku 3.5	Sonnet 3.5	Opus 3.5
Context Window	200K tokens	200K tokens	200K tokens
Input Pricing	$0.80/1M	$3.00/1M	$15.00/1M
Output Pricing	$4.00/1M	$15.00/1M	$60.00/1M
Speed	Fastest	Balanced	Slow
Reasoning	Good	Very Good	Excellent
Coding	Good	Excellent	Excellent
Best For	Simple tasks, high volume, cost-sensitive	Balanced use, production APIs	Complex reasoning, multi-step tasks
Extended Thinking	Supported	Supported	Supported
Vision (Images)	Yes	Yes	Yes
Tool Use	Yes	Yes	Yes
Max Tokens	Recommend <2048	Recommend <4096	Recommend <8192
Streaming	Yes	Yes	Yes
Cached Tokens	$0.08/1M (90% discount)	$0.30/1M (90% discount)	$1.50/1M (90% discount)

When to Use Each Model

Haiku: Classification, simple Q&A, high-throughput systems, RAG retrieval ranking
Sonnet: Production APIs, chatbots, code generation, RAG synthesis, balanced latency/quality
Opus: Research tasks, complex reasoning, math/physics, novel problem-solving, cost-insensitive

Implementation Checklist for Certification

Basic API Usage

Create client with API key
Make basic request with model, max_tokens, messages
Handle response and extract text/stop_reason/usage
Implement error handling (rate limits, timeouts, auth)

Multi-Turn & Advanced Parameters

Build conversation history and multi-turn loop
Set system prompt and understand scope
Configure temperature and top_p
Implement streaming with event loop
Use structured output (JSON schema) and validate

Prompt Engineering

Write clear, specific prompts with examples
Use XML tags for complex instructions
Implement few-shot learning with examples
Add chain-of-thought prompts for reasoning
Test variations on evaluation set

Tool Use

Define tool schemas with input parameters
Implement tool-use loop: send tools → handle response → execute → return result
Handle multiple tools in single turn
Use tool_choice parameter (auto, required, specific)
Add error handling for tool failures

RAG

Chunk documents (semantic or fixed-size)
Build retrieval function (BM25 or vector-based)
Integrate retrieval with Claude request
Measure retrieval quality and relevance
Implement hybrid search if needed

Advanced Features

Use extended thinking for reasoning task; measure improvement
Upload and analyze image (base64 or URL)
Upload document via Files API
Implement prompt caching; measure cache hit rate
Add citations to responses where relevant

Agents & Workflows

Implement parallel task execution
Chain multiple model calls with data flow
Create router agent with classification + routing
Use tools in agentic loop with goal checking
Implement error recovery and retries

Evaluation & Optimization

Create test dataset (20+ examples)
Implement code-based grading (exact match, regex, schema)
Use model-based grading for subjective tasks
Measure accuracy, F1, or custom metrics
Track token usage and costs per task
Identify failure patterns and iterate

Deep Dives & Advanced Topics

Prompt Caching Strategy

When to cache:

System prompt (>1K tokens) used in all requests
Few-shot examples (>1K tokens) stable across queries
Large reference documents included in system
Long conversation history (>5 turns)

Cost calculation:

Without cache:
  5 requests × 5000 tokens = 25,000 input tokens
  Cost: 25,000 × $3.00 / 1M = $0.075

With cache (1x creation, 4x reads):
  Creation: 5000 tokens × $3.00 / 1M = $0.015
  Reads: 4 × 5000 × $0.30 / 1M = $0.006
  Total: $0.021 (72% savings)

Extended Thinking Budget

Recommended budgets by task:

Simple reasoning: 2,000-5,000 tokens
Medium complexity (coding, math): 5,000-10,000 tokens
Complex multi-step: 10,000-20,000 tokens (max)

Token cost:

Thinking tokens = output tokens (NOT discounted)
If thinking=5000 + response=1000 = 6000 output tokens charged

Tool Use Best Practices

Schema design:

{
  "name": "calculate_revenue",
  "description": "Calculate total revenue for a given period and product category",
  "input_schema": {
    "type": "object",
    "properties": {
      "start_date": {"type": "string", "format": "date", "description": "ISO 8601 date"},
      "end_date": {"type": "string", "format": "date"},
      "category": {"type": "string", "enum": ["electronics", "clothing", "food"], "description": "Product category"}
    },
    "required": ["start_date", "end_date", "category"]
  }
}

Error handling in tool results:

if error_occurred:
    tool_result = {
        "type": "tool_result",
        "tool_use_id": use_id,
        "content": f"Error: Database connection failed. Retrying...",
        "is_error": True
    }
else:
    tool_result = {
        "type": "tool_result",
        "tool_use_id": use_id,
        "content": json.dumps({"revenue": 12345, "units": 500})
    }

RAG Optimization

Chunking strategies:

Semantic: Split on headers/paragraphs
- Pros: Preserves context, reduces redundancy
- Cons: Variable chunk sizes, harder to implement
Fixed sliding window: 512 tokens with 256-token overlap
- Pros: Consistent, predictable
- Cons: May split important concepts
Hierarchical: Section → subsection → paragraph
- Pros: Enables different retrieval granularities
- Cons: More complex indexing

Retrieval quality metrics:

Precision@k: % of top-k results relevant to query
Recall: % of all relevant docs retrieved
MRR (Mean Reciprocal Rank): Average position of first relevant result
NDCG (Normalized Discounted Cumulative Gain): Relevance ranking quality

Hybrid search weights:

BM25 weight: 0.3-0.5 (keyword precision)
Vector weight: 0.5-0.7 (semantic similarity)
Metadata weight: 0.0-0.2 (date/category filters)

Agent Design Patterns

Loop Agent with Tool Use:

def loop_agent(goal: str, max_iterations: int = 10) -> str:
    state = {"goal": goal, "steps": [], "current_info": ""}
    
    for i in range(max_iterations):
        # Perceive & Plan
        response = client.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=1024,
            tools=tools,
            system=f"Goal: {goal}\nProgress so far: {state['current_info']}",
            messages=[{"role": "user", "content": "What's the next step?"}]
        )
        
        # Check if done
        if response.stop_reason == "end_turn":
            return response.content[0].text
        
        # Act (execute tool)
        if response.stop_reason == "tool_use":
            tool_use = next(b for b in response.content if b.type == "tool_use")
            result = execute_tool(tool_use.name, tool_use.input)
            state["steps"].append((tool_use.name, result))
            state["current_info"] += f"\nStep {i+1}: {tool_use.name} → {result}"
    
    return "Max iterations reached"

Unresolved Questions / Topics for Further Study

How to handle very large documents (100K+ tokens) in RAG? (Hierarchical chunking strategies)
Fine-tuning vs RAG: When to use fine-tuning for domain-specific tasks?
Cost optimization for high-volume production: Batching, caching, model selection trade-offs?
Guardrails and content filtering: Implementing safety layers on top of Claude API?
Multi-language support: How well does Claude handle non-English prompts in tool use?
Real-time streaming UI patterns: Best practices for streaming multiple concurrent requests?
Agent memory persistence: Effective strategies for long-running agents with context limits?
MCP server scaling: Production deployment patterns for MCP servers with multiple clients?

Building with the Claude API - Certification Study Guide#

MODULE 1: Introduction - Claude Models Overview#

Key Notes#

Best Practices#

Example (Python)#

MODULE 2: Accessing Claude with the API#

Key Notes#

Best Practices#

Example (Python)#

MODULE 3: Prompt Evaluation#

Key Notes#

Best Practices#

Example (Python)#

MODULE 4: Prompt Engineering Techniques#

Key Notes#

Best Practices#

Example (Python)#

MODULE 5: Tool Use with Claude#

Key Notes#

Best Practices#

Example (Python)#

MODULE 6: RAG and Agentic Search#

Key Notes#

Best Practices#

Example (Python)#

MODULE 7: Features of Claude#

Key Notes#

Best Practices#

Example (Python)#

MODULE 8: Model Context Protocol (MCP)#

Key Notes#

Best Practices#

Example (Python)#

MODULE 9: Anthropic Apps (Claude Code)#

Key Notes#

Best Practices#

Example (Settings)#

MODULE 10: Agents and Workflows#

Key Notes#

Best Practices#

Example (Python & Pseudocode)#

MODULE 11: Conclusion & Course Summary#

Key Takeaways#

Certification Prep Checklist#

Resources#

Claude Model Comparison Table#

When to Use Each Model#

Implementation Checklist for Certification#

Basic API Usage#

Multi-Turn & Advanced Parameters#

Prompt Engineering#

Tool Use#

RAG#

Advanced Features#

Agents & Workflows#

Evaluation & Optimization#

Deep Dives & Advanced Topics#

Prompt Caching Strategy#

Extended Thinking Budget#

Tool Use Best Practices#

RAG Optimization#

Agent Design Patterns#

Unresolved Questions / Topics for Further Study#

Building with the Claude API - Certification Study Guide

MODULE 1: Introduction - Claude Models Overview

Key Notes

Best Practices

Example (Python)

MODULE 2: Accessing Claude with the API

Key Notes

Best Practices

Example (Python)

MODULE 3: Prompt Evaluation

Key Notes

Best Practices

Example (Python)

MODULE 4: Prompt Engineering Techniques

Key Notes

Best Practices

Example (Python)

MODULE 5: Tool Use with Claude

Key Notes

Best Practices

Example (Python)

MODULE 6: RAG and Agentic Search

Key Notes

Best Practices

Example (Python)

MODULE 7: Features of Claude

Key Notes

Best Practices

Example (Python)

MODULE 8: Model Context Protocol (MCP)

Key Notes

Best Practices

Example (Python)

MODULE 9: Anthropic Apps (Claude Code)

Key Notes

Best Practices

Example (Settings)

MODULE 10: Agents and Workflows

Key Notes

Best Practices

Example (Python & Pseudocode)

MODULE 11: Conclusion & Course Summary

Key Takeaways

Certification Prep Checklist

Resources

Claude Model Comparison Table

When to Use Each Model

Implementation Checklist for Certification

Basic API Usage

Multi-Turn & Advanced Parameters

Prompt Engineering

Tool Use

RAG

Advanced Features

Agents & Workflows

Evaluation & Optimization

Deep Dives & Advanced Topics

Prompt Caching Strategy

Extended Thinking Budget

Tool Use Best Practices

RAG Optimization

Agent Design Patterns

Unresolved Questions / Topics for Further Study