← Harness Engineering

Patterns

Patterns

Patterns are the foundational building blocks — not specific to memory, workflow, or permissions, but underneath all of them. They’re the technical primitives that make everything else work.

If the other factor files describe what to build, this file describes how to make it work at the code level.


Async generator as control flow

The entire agentic behavior in Claude Code lives in an async function* running while(true). Each iteration has 4 phases:

  1. Context Assembly — build the system prompt, apply compaction, assemble conversation history
  2. Stream API Call — send to the model, stream tokens back
  3. Tool Execution — for each tool_use block, run the tool (possibly in parallel), inject results as user messages
  4. Stop or Continue — derived flag (see below)

Why an async generator, not callbacks or event emitters:

  • Pull-based semantics. The consumer (UI) controls the speed. If the UI is slow, the generator pauses itself via backpressure. No buffer management.
  • Cancelable. .return() on the generator immediately closes it. Tool calls in flight get canceled via an abort controller. No half-finished state.
  • Pausable. Stop calling .next() and the generator pauses (e.g., waiting for a permission dialog). Resume when the user clicks approve.
  • Composable. yield* delegates to sub-generators. Subagents can be implemented as generators that the main generator yields into.

Rust port (conversation.rs) uses a synchronous loop {} with an ApiClient::stream() trait. Same pattern — loop until no tool_use blocks remain — just without the generator syntax sugar.

Lesson: for any streaming AI agent with a UI, async generators give you backpressure, cancellation, and pause for free. Don’t build an event emitter; build a generator.


Derived flag instead of API signal

Claude Code does NOT trust stop_reason from the API to decide whether to continue the loop. The leaked code comment:

stop_reason === 'tool_use' is unreliable — it’s not always set correctly.

Instead, it observes actual content: if any tool_use blocks exist in the streamed response, needsFollowUp = true. The decision is derived from the content of the response, not from the API’s metadata about the response.

Rust port (conversation.rs:197):

if pending_tool_uses.is_empty() { break; }

Same derived-flag approach.

Lesson: derive control flow from content, not metadata. Metadata can be wrong; content cannot. This generalizes to any system integrating external APIs: if you can compute the signal from the payload, don’t trust a separate metadata field to tell you the same thing.


Concurrency-safe partitioning

When the LLM returns N tool_use blocks, the system doesn’t run them all sequentially or all in parallel. It partitions at runtime:

  • Each tool implements isConcurrencySafe(input) — defaulting to false (conservative)
  • The decision is per-invocation, not per-tool-type:
    • Bash("cat file.txt") → safe (read-only)
    • Bash("npm install") → exclusive (mutates node_modules/)
  • Greedy batching: consecutive safe tools are grouped into a concurrent batch; an exclusive tool breaks the batch

Example:

Input:  [Read, Read, Grep, Write, Read, Read]
Output: [Read+Read+Grep] [Write] [Read+Read]
        ↑ 3 batches instead of 6 sequential calls

Max concurrency: 10 (configurable via CLAUDE_CODE_MAX_TOOL_USE_CONCURRENCY).

Lesson: let tools self-declare safety per invocation. Default to exclusive (safe-by-default). The system partitions automatically. This applies to any system with mixed read/write operations.

See tools.md for the tool-design implications.


Streaming tool execution

Don’t wait for the LLM response to complete. Execute tools immediately as tool_use blocks arrive from the stream. A StreamingToolExecutor (530 LOC in the leaked code) manages a state machine per tool:

queued → executing → completed → yielded

Sibling abort rules:

  • Only Bash errors cancel sibling tools
  • Read/Grep/WebFetch errors don’t — they’re independent

Comment from the leak:

Read/WebFetch/etc are independent — one failure shouldn’t nuke the rest.

Lesson: start processing partial results immediately. Classify which failures cascade and which are independent. Never default to cascading cancellation.


Context modifier chain

Tools can modify shared context for subsequent tools via contextModifier callbacks. Example: a cd tool modifies the working directory for all subsequent Bash calls.

Constraint: only exclusive (non-concurrent) tools can return modifiers. Prevents non-deterministic state from parallel application — if two concurrent tools each try to cd, the result is undefined.

Pattern: immutable context snapshots with explicit transitions. Functional programming applied at the architecture level — Redux reducers for agent state.

Lesson: never mutate shared state from concurrent operations. Use modifier functions applied sequentially.


Dynamic boundary marker

Claude Code’s system prompt is structured with a special marker:

Intro
System rules
Doing tasks
Actions care
__SYSTEM_PROMPT_DYNAMIC_BOUNDARY__     ← marker
Environment
Project context
Instruction files
Runtime config

Everything before the boundary is static across turns → cacheable by the API’s prompt cache. Everything after the boundary changes per turn → not cacheable.

Why it matters: API prompt caching gives you free tokens for cached prefixes. Every static section moved above the boundary saves money on every turn of every session. The boundary is a performance lever.

Lesson: split static from dynamic sections explicitly. The split should be physical (a marker), not conceptual. Cache hits are a first-class concern.

See context.md for more detail.


Instruction file discovery (walk + dedupe + budget)

Claude Code loads instruction files (CLAUDE.md, CLAUDE.local.md, etc.) by walking ancestor directories. At each directory, it checks for the instruction files and loads them. Then:

  1. Deduplication by content hash — if two directories have identical content, load once
  2. Per-file budgetMAX_INSTRUCTION_FILE_CHARS = 4,000
  3. Total budgetMAX_TOTAL_INSTRUCTION_CHARS = 12,000
  4. Truncation marker[truncated] when a file exceeds per-file budget

Why: in a monorepo, a naive walk loads 40 files × 10 KB each = 400 KB. Unbudgeted walks blow out the window. Dedup + budgets make the walk safe.

Lesson: any hierarchical discovery needs budgets and dedup. Don’t trust the filesystem to be small.


Speculative execution

BashTool starts the permission check in parallel with input parsing. By the time canUseTool() is called by the consumer, the classification result may already be ready. Reduces perceived latency.

Generalization: if a check is deterministic given the input, start it as soon as the input is available. Don’t wait for the consumer to ask. This is the agent-system analogue of CPU branch prediction — do the work preemptively, cancel if wrong.


Circuit breakers everywhere

Every recovery layer has a circuit breaker. Examples:

  • hasAttemptedReactiveCompact — reactive compact runs at most once per iteration
  • stop_hook_active — Stop hook blocks once, then lets the next Stop pass through
  • Per-strategy budget counters for recovery
  • isInForkChild() flag prevents recursive fork spawning

Pattern: a flag set when a strategy starts, checked before that strategy runs again. Reset at natural boundaries (iteration, session, fork).

Lesson: every retry/recovery/escalation layer needs a circuit breaker. Otherwise the system can loop inside recovery, never reaching the surface.

See recovery.md for more detail.


Fail-open hooks

Hooks must never break the session. Every hook:

  • Wraps in a try/catch
  • Logs crashes to hooks/.logs/hook-log.jsonl
  • Returns exit 0 even on failure

A buggy hook degrades the session; it doesn’t break it. The user can fix the hook offline.

Trade-off: fail-open hides bugs. Mitigation: log every crash; surface the log location at session start.

See hooks.md.


Pre-compute at SessionStart, read via env vars

Expensive detection (project type, language version, git state, OS) runs once per session in session-init.cjs and writes to environment variables. Downstream hooks (dev-rules-reminder.cjs, subagent-init.cjs, etc.) read the env vars instead of recomputing.

Why: UserPromptSubmit fires on every user message. Re-detecting project type on every prompt is wasteful. SessionStart fires once; its output is cached in env vars for the life of the session.

Lesson: expensive discovery at the cheapest lifecycle point. Cheaper lifecycle points read the cached result.


Native replacement when cross-platform beats performance

Claude Code ports native binaries to pure TypeScript when cross-platform support outweighs the performance cost. Example: Yoga layout engine — originally a C++ library, ported to 2,578 LOC of TypeScript.

Why: C++ bindings break across Node versions, OS versions, and architectures. Pure TS runs everywhere Node runs. The performance cost is real but bounded; the maintenance cost of cross-platform C++ bindings is unbounded.

Lesson: if a native dependency is causing portability pain, consider porting it. The LOC cost is usually recoverable; the cross-platform debt is not.


Terminal-as-browser rendering

Claude Code’s UI is a React reconciler that renders to Yoga flexbox that outputs ANSI escape sequences. 390 React components render in the terminal.

This is the agent analogue of “use web technologies for everything”. You get the component model, the reconciler, the declarative state management — but rendering to ANSI instead of DOM.

Lesson: borrow UI infrastructure from the web ecosystem even when the target isn’t a browser. The component model is the win; the render target is secondary.


Takeaways for harness engineering

  1. Main loop = async generator (or equivalent). Free backpressure, cancellation, pause, compose.
  2. Derive control flow from content, not metadata. Content is truth; metadata lies.
  3. Concurrency safety is per-invocation, not per-type. Tools self-declare per call.
  4. Partition greedily. Consecutive safe tools → concurrent batch. Exclusive breaks the batch.
  5. Stream tool execution from the LLM stream. Don’t wait for the full response.
  6. Classify failure cascades. Stateful failures cancel siblings; stateless don’t.
  7. Context modifiers are exclusive-only. No mutation from concurrent ops.
  8. Static vs dynamic boundary is a physical marker. Move static sections above it for cache wins.
  9. Deduplicate and budget hierarchical discovery. Dedup by content hash; hard cap total size.
  10. Speculate deterministic checks. Start early; cancel if wrong.
  11. Circuit breakers in every recovery layer. Flag at start, check before re-entry, reset at boundaries.
  12. Hooks fail open. Exit 0 on crash. Log to JSONL.
  13. Expensive detection once, env vars for the rest. SessionStart is the only place that pays.
  14. Port native deps when cross-platform > performance. Bounded TS cost beats unbounded binding cost.
  15. Borrow web UI infrastructure. Component model is the win; rendering target is secondary.

What this repo does

  • hooks/session-init.cjs implements the “expensive detection, cached via env vars” pattern. Project detection runs once; downstream hooks read the results from env vars.
  • hooks/lib/ contains shared utilities reused across hooks: project-detector.cjs, context-builder.cjs, env-resolver.cjs, scout-checker.cjs, privacy-checker.cjs, transcript-parser.cjs, git-info-cache.cjs. Library-per-hook pattern, not copy-paste.
  • hooks/stop-verify.cjs uses stop_hook_active as a circuit breaker — first Stop pass injects the verify prompt, second pass lets the agent finish. Textbook circuit-breaker pattern.
  • Crash wrapper on every .cjs hook — try/catch + JSONL log + exit 0. The fail-open pattern.
  • hooks/loop-detection.cjs — soft circuit breaker on per-file edit count. Not an abort; a nudge.
  • hooks/usage-context-awareness.cjs — throttled cache write pattern. 60s TTL for API data, read by statusline and context builder.

Patterns this repo does NOT use (yet)

  • Async generator main loop — we don’t run our own loop; Claude Code does. If we built a standalone harness runtime, this would be the foundation.
  • Concurrency partitioning — inside Claude Code, not harness-controlled.
  • Context modifier chain — same.
  • Dynamic boundary marker in system prompt — we don’t construct the system prompt; we inject into it via hooks. A dedicated marker would require harness-level prompt assembly.
  • Speculative execution — nothing is speculatively started; all checks are synchronous.
  • Native replacement — not applicable; we don’t have native dependencies beyond what Claude Code ships.

These would all come into play if the user decided to build an independent agent loop on top of the Claude Agent SDK instead of relying on Claude Code’s runtime.