← Claude Code & Certification

[PREP] Claude Certified Architect - Foundations Exam Notes

Claude Certified Architect - Foundations Exam Notes

Exam Overview

  • Format: Multiple choice (1 correct, 3 distractors), no penalty for guessing
  • Passing score: 720/1000 (scaled)
  • Scenarios: 4 of 6 picked randomly per exam
  • Domains: 5 weighted areas

Domain Weights

DomainWeight
1. Agentic Architecture & Orchestration27%
2. Tool Design & MCP Integration18%
3. Claude Code Configuration & Workflows20%
4. Prompt Engineering & Structured Output20%
5. Context Management & Reliability15%

Exam Scenarios

  1. Customer Support Resolution Agent - SDK agent with MCP tools, 80% first-contact resolution target
  2. Code Generation with Claude Code - Team workflows, CLAUDE.md, plan mode
  3. Multi-Agent Research System - Coordinator + subagents (search, analyze, synthesize, report)
  4. Developer Productivity - Codebase exploration, built-in tools, MCP integration
  5. Claude Code for CI/CD - Automated reviews, test generation, PR feedback
  6. Structured Data Extraction - JSON schemas, validation-retry, batch processing

Domain 1: Agentic Architecture & Orchestration (27%)

MUST KNOW

Agentic Loop Lifecycle:

  • Send request to Claude -> inspect stop_reason -> if “tool_use”: execute tool, append result, loop -> if “end_turn”: done
  • Tool results MUST be appended to conversation history for next iteration
  • Model-driven decisions (Claude reasons about next tool) NOT pre-configured tool sequences

Anti-patterns to avoid:

  • Parsing natural language to determine loop termination
  • Setting arbitrary iteration caps as PRIMARY stopping mechanism
  • Checking for assistant text content as completion indicator

Hub-and-spoke (Coordinator-Subagent):

  • Coordinator manages ALL inter-subagent communication
  • Subagents have ISOLATED context - do NOT inherit coordinator’s conversation history
  • Coordinator does: task decomposition, delegation, result aggregation, deciding which subagents to invoke
  • Risk: overly narrow task decomposition -> incomplete topic coverage

Subagent Spawning:

  • Task tool = mechanism for spawning subagents
  • allowedTools must include “Task” for coordinator to invoke subagents
  • Context must be EXPLICITLY provided in prompt - no automatic inheritance
  • Parallel subagents: emit MULTIPLE Task tool calls in a SINGLE coordinator response
  • Pass complete findings from prior agents directly in subagent’s prompt

Programmatic vs Prompt-based Enforcement:

  • When deterministic compliance required (identity verification before financial ops) -> programmatic hooks
  • Prompt instructions have NON-ZERO failure rate for critical sequences
  • Hooks provide guaranteed compliance; prompts provide probabilistic compliance

Agent SDK Hooks:

  • PostToolUse: intercept tool results for transformation/normalization before model processes them
  • Tool call interception: block policy-violating actions, redirect to alternative workflows
  • Choose hooks over prompts when business rules require GUARANTEED compliance

Task Decomposition:

  • Fixed sequential (prompt chaining): predictable multi-aspect reviews
  • Dynamic adaptive: open-ended investigation where subtasks depend on discoveries
  • Large code reviews: per-file local analysis + separate cross-file integration pass

Session Management:

  • --resume <session-name>: continue specific prior conversation
  • fork_session: create independent branches from shared analysis baseline
  • New session with structured summary > resuming with stale tool results
  • Inform resumed sessions about specific file changes for targeted re-analysis

KEY DISTINCTIONS

  • Programmatic enforcement = deterministic = hooks = for critical business logic
  • Prompt-based guidance = probabilistic = for soft preferences
  • Self-review in same session = weak (retains reasoning bias)
  • Independent review instance = strong (no prior context bias)

Domain 2: Tool Design & MCP Integration (18%)

MUST KNOW

Tool Descriptions:

  • Primary mechanism LLMs use for tool selection
  • Minimal descriptions -> unreliable selection among similar tools
  • Must include: input formats, example queries, edge cases, boundary explanations
  • Ambiguous/overlapping descriptions cause misrouting

Fix tool confusion by:

  • Expanding descriptions (first step, highest leverage)
  • Renaming tools to eliminate overlap
  • Splitting generic tools into purpose-specific ones

Structured Error Responses (MCP isError flag):

  • Return: errorCategory (transient/validation/permission), isRetryable boolean, human-readable description
  • Uniform “Operation failed” prevents agent from making recovery decisions
  • Transient = retry; Validation = fix input; Permission = escalate; Business = explain to user
  • Distinguish access failures (need retry) from valid empty results (successful query, no matches)

Tool Distribution:

  • Too many tools (18 instead of 4-5) DEGRADES selection reliability
  • Agents with tools outside specialization tend to MISUSE them
  • Give agents ONLY tools needed for their role
  • Scoped cross-role tools for high-frequency needs (e.g., verify_fact for synthesis agent)

tool_choice options:

  • "auto": model may return text OR call a tool
  • "any": model MUST call a tool (but can choose which)
  • {"type": "tool", "name": "..."}: FORCE specific tool call

MCP Server Scoping:

  • Project-level: .mcp.json (shared via version control)
  • User-level: ~/.claude.json (personal/experimental)
  • Environment variable expansion: ${GITHUB_TOKEN} in .mcp.json
  • Tools from ALL configured servers available simultaneously
  • MCP resources: expose content catalogs to reduce exploratory tool calls

Built-in Tools:

  • Grep = content search (function names, error messages, patterns IN files)
  • Glob = file path pattern matching (find files BY name/extension)
  • Read/Write = full file operations; Edit = targeted modifications using unique text matching
  • If Edit fails (non-unique match) -> Read + Write as fallback
  • Build understanding incrementally: Grep entry points -> Read to trace flows

Domain 3: Claude Code Configuration & Workflows (20%)

MUST KNOW

CLAUDE.md Hierarchy:

  • User-level: ~/.claude/CLAUDE.md (personal, NOT shared via version control)
  • Project-level: .claude/CLAUDE.md or root CLAUDE.md (shared with team)
  • Directory-level: subdirectory CLAUDE.md files (scoped to that area)
  • @import syntax: reference external files for modularity

.claude/rules/ Directory:

  • Alternative to monolithic CLAUDE.md
  • Topic-specific rule files
  • YAML frontmatter paths field with glob patterns for conditional activation
  • Rules load ONLY when editing matching files -> reduces irrelevant context
  • Better than directory-level CLAUDE.md for conventions spanning multiple dirs

Custom Slash Commands:

  • Project-scoped: .claude/commands/ (shared via VCS)
  • User-scoped: ~/.claude/commands/ (personal)

Skills (.claude/skills/):

  • SKILL.md with frontmatter: context: fork, allowed-tools, argument-hint
  • context: fork: runs in isolated sub-agent context, prevents polluting main conversation
  • allowed-tools: restricts tool access during skill execution
  • Skills = on-demand invocation; CLAUDE.md = always-loaded

Plan Mode vs Direct Execution:

  • Plan mode: complex tasks, large-scale changes, multiple valid approaches, architectural decisions, multi-file modifications
  • Direct execution: simple, well-scoped changes (single validation check, one function)
  • Explore subagent: isolates verbose discovery output, returns summaries to preserve main context
  • Combine: plan mode for investigation, direct execution for implementation

CI/CD Integration:

  • -p (or --print) flag: non-interactive mode for automated pipelines
  • --output-format json + --json-schema: structured output for CI
  • Same session that generated code is LESS effective at reviewing its own changes
  • Include prior review findings to avoid duplicate comments on re-review

Iterative Refinement:

  • Concrete input/output examples > prose descriptions for transformation specs
  • Test-driven iteration: write tests first, iterate by sharing failures
  • Interview pattern: Claude asks questions to surface unconsidered aspects
  • Multiple interacting issues -> single message; independent issues -> sequential

Domain 4: Prompt Engineering & Structured Output (20%)

MUST KNOW

Explicit Criteria:

  • “Flag comments only when claimed behavior contradicts actual code” > “check comments are accurate”
  • General instructions (“be conservative”) DON’T improve precision
  • High false positive rates undermine confidence in ALL categories
  • Temporarily disable high-FP categories to restore trust while improving

Few-shot Prompting:

  • Most effective for consistently formatted, actionable output
  • Show reasoning for WHY one action chosen over alternatives
  • Enable generalization to novel patterns (not just matching pre-specified cases)
  • Reduce hallucination in extraction tasks
  • 2-4 targeted examples for ambiguous scenarios

Structured Output via tool_use:

  • tool_use + JSON schemas = most reliable for guaranteed schema compliance
  • Eliminates JSON SYNTAX errors but NOT SEMANTIC errors (items don’t sum, wrong fields)
  • tool_choice: "any" = guarantee structured output when doc type unknown
  • Schema design: optional/nullable fields prevent model fabricating values for absent info
  • Enum with “other” + detail string = extensible categories
  • “unclear” enum value for ambiguous cases

Validation-Retry Loops:

  • Append specific validation errors to prompt on retry
  • Retries INEFFECTIVE when info simply absent from source (vs format/structural errors)
  • Track detected_pattern field for systematic analysis of false positive dismissals
  • Self-correction: extract “calculated_total” alongside “stated_total” to flag discrepancies

Message Batches API:

  • 50% cost savings, up to 24-hour processing window
  • NO guaranteed latency SLA
  • Appropriate: non-blocking, latency-tolerant (overnight reports, weekly audits)
  • Inappropriate: blocking workflows (pre-merge checks)
  • Does NOT support multi-turn tool calling within a single request
  • custom_id for correlating request/response pairs
  • Handle failures: resubmit only failed docs by custom_id

Multi-instance Review:

  • Self-review = weak (retains reasoning context from generation)
  • Independent instance (no prior reasoning) = more effective at catching subtle issues
  • Split large reviews: per-file local analysis + cross-file integration passes
  • Confidence alongside findings enables calibrated routing

Domain 5: Context Management & Reliability (15%)

MUST KNOW

Context Preservation:

  • Progressive summarization LOSES: numerical values, percentages, dates, customer expectations
  • “Lost in the middle” effect: models reliably process beginning/end, may omit middle sections
  • Tool results accumulate disproportionately (40+ fields when only 5 relevant)
  • MUST pass complete conversation history in subsequent API requests

Techniques:

  • Extract transactional facts into persistent “case facts” block OUTSIDE summarized history
  • Trim verbose tool outputs to only relevant fields BEFORE accumulation
  • Place key findings at BEGINNING; detailed results with explicit section headers
  • Subagents return structured data (key facts, citations, relevance) not verbose reasoning chains
  • /compact to reduce context during extended sessions

Escalation Patterns:

  • Triggers: customer requests human, policy exceptions/gaps, inability to progress
  • Honor explicit human requests IMMEDIATELY (don’t try to resolve first)
  • Sentiment-based escalation = UNRELIABLE proxy for case complexity
  • Self-reported confidence = POORLY CALIBRATED
  • Multiple customer matches -> ask for additional identifiers, NOT heuristic selection
  • Policy is ambiguous -> escalate (e.g., competitor price match when policy silent)

Error Propagation (Multi-Agent):

  • Return: failure type, attempted query, partial results, alternative approaches
  • Generic “search unavailable” HIDES valuable context
  • Silently suppressing errors (empty as success) = anti-pattern
  • Terminating entire workflow on single failure = anti-pattern
  • Subagents: local recovery for transient failures, escalate only unresolvable errors + partial results

Large Codebase Exploration:

  • Context degrades in extended sessions -> inconsistent answers, “typical patterns” instead of specifics
  • Scratchpad files: persist key findings across context boundaries
  • Subagent delegation: isolate verbose exploration, main agent coordinates high-level
  • Crash recovery: structured agent state exports (manifests), coordinator loads on resume

Human Review & Confidence:

  • Aggregate accuracy (97%) may MASK poor performance on specific doc types/fields
  • Stratified random sampling for measuring error rates in high-confidence extractions
  • Field-level confidence calibrated using labeled validation sets
  • Validate accuracy BY document type AND field BEFORE automating

Information Provenance:

  • Source attribution lost during summarization without claim-source mappings
  • Conflicting stats: annotate with source attribution, don’t arbitrarily select one
  • Require publication/collection dates to prevent temporal misinterpretation
  • Render content type-appropriately (financial=tables, news=prose, technical=structured lists)

Out of Scope (DO NOT study)

  • Fine-tuning, training custom models
  • API authentication, billing, account management
  • Deploying/hosting MCP servers (infrastructure)
  • Claude’s internal architecture, training process
  • Constitutional AI, RLHF, safety training
  • Embeddings, vector databases
  • Computer use (browser automation)
  • Vision/image analysis
  • Streaming API, SSE
  • Rate limiting, quotas, pricing
  • OAuth, API key rotation
  • Cloud provider configs (AWS, GCP, Azure)
  • Performance benchmarking, model comparison
  • Prompt caching implementation details
  • Token counting algorithms

Key Exam Patterns

  1. When reliability is critical -> programmatic enforcement (hooks), not prompts
  2. Tool selection confused -> expand descriptions first (lowest effort, highest leverage)
  3. Self-review -> always use independent instance (no shared reasoning context)
  4. Batch API -> only for latency-tolerant workloads, never blocking workflows
  5. Multiple matches -> ask for clarification, never heuristic selection
  6. Errors -> structured context (type + attempted + partial + alternatives), never generic
  7. Context growing -> trim tool outputs, extract facts, use subagents for isolation
  8. Large code review -> split into per-file + cross-file passes
  9. Few-shot examples -> for ambiguous scenarios where instructions produce inconsistent results
  10. Schema fields -> make optional/nullable when info might be absent (prevents hallucination)