← Harness Engineering

Hooks

Hooks

Hooks are deterministic shell commands or actions that run at specific agent lifecycle points — outside prompt instructions entirely. They are how the harness enforces invariants that can’t be trusted to a prompt.

This is the single most powerful pattern for moving behavior out of the model and into the system.


The core problem

Relying on prompts to remember procedural steps is unreliable. Models behave differently under context pressure. They forget rules from 50 turns ago. They skip steps when in a hurry. They invent new shortcuts. The harness needs guarantees that don’t depend on the model’s discretion.

Hooks are the answer. Procedural steps that MUST happen become hook-driven, not prompt-driven. The model can’t skip a hook because the hook isn’t asking for permission — the system runs it.


Patterns observed in Claude Code

26+ lifecycle hook events

The leaked code contained 26+ distinct hook events. Examples:

  • SessionStart — once per session, on startup/resume/clear/compact
  • SessionEnd — once per session, on shutdown
  • SubagentStart — when a Task subagent spawns
  • SubagentStop — when a subagent finishes
  • UserPromptSubmit — every user message, before the LLM sees it
  • PreToolUse — before any tool call
  • PostToolUse — after any tool call
  • Stop — when the agent is about to finish a turn
  • PreCompact — before context compaction runs
  • PostCompact — after context compaction completes
  • FileChanged — when an external process modifies a watched file
  • CwdChanged — when the working directory changes
  • …and more

Each event has a specific contract: input fields on stdin, expected output format, blocking semantics.

Why so many events: the more lifecycle points the harness can observe, the more behavior it can move out of the prompt. Every hook event is an opportunity to enforce something deterministically.

Hook semantics

Each hook receives JSON on stdin and returns JSON on stdout. The output decides what happens:

  • {} — proceed normally
  • {"decision": "block", "reason": "..."} — block the operation, inject the reason as a system message
  • {"hookSpecificOutput": {...}} — inject additional context, allow the operation
  • Non-zero exit code — log error, fail-open (continue as if hook didn’t run)

Blocking is rare. Most hooks inject context or perform side effects. Blocking is reserved for invariant enforcement (e.g., “you can’t Edit outside the freeze boundary”).

Matchers

Hooks register against specific tools or events using regex matchers:

{
  "PreToolUse": [
    {
      "matcher": "Edit|Write|MultiEdit",
      "hooks": [{"type": "command", "command": "..."}]
    }
  ]
}

The matcher decides which tool calls trigger the hook. * matches all; | separates alternatives; Edit|Write|MultiEdit is the typical “all writes” pattern.

Lesson: matchers let one hook serve multiple tools without separate registration. Keep matchers narrow — broad matchers run more often, costing latency on every matched call.

Hook chains

Multiple hooks can register against the same event. They run in declaration order:

{
  "Stop": [
    {
      "hooks": [
        {"command": "node $CLAUDE_PROJECT_DIR/.claude/hooks/stop-verify.cjs", "timeout": 45},
        {"command": "node $CLAUDE_PROJECT_DIR/.claude/hooks/another-stop-hook.cjs", "timeout": 30}
      ]
    }
  ]
}

Both hooks run when the agent tries to stop. Either can block. Both have separate timeouts.

Order matters: if the first hook blocks, the second still runs (Claude Code semantics). If you want strict ordering, encode it in the hook scripts themselves.

Skip hooks during recovery

Critical insight from recovery.md: hooks are skipped when the agent is in error recovery mode. Hooks add tokens; tokens worsen prompt_too_long. The recovery path bypasses the normal hook chain.

Implication: never rely on a hook to enforce invariants during error recovery. The hook won’t run.

Fail-open semantics

Hooks must never break the session. Every hook is wrapped in:

  • A try/catch
  • Crash logging to hooks/.logs/hook-log.jsonl
  • Exit 0 (fail-open) on any unexpected error

Why: a buggy hook should be a degraded-mode session, not a broken one. The user can fix the hook offline; the session continues with reduced enforcement.

Trade-off: fail-open hides hook bugs. Mitigation: log every crash; surface the log location at session start; have a harness-tune skill that audits hook logs.

Hooks vs prompts

When to use which:

Use a hook when…Use a prompt when…
The behavior MUST happenThe behavior is preferred
It’s a yes/no enforcementIt’s a guideline
Latency budget allows itThe decision needs LLM judgment
The check is deterministicThe check is fuzzy
It’s a side effect (logging, telemetry)It’s a UX nudge

Rule of thumb: if you’re tempted to write “ALWAYS do X” in a system prompt, you should probably write a hook instead. Prompts get ignored under pressure; hooks don’t.


Hook anatomy

A typical hook script:

#!/usr/bin/env node
// Crash wrapper
try {
  const { isHookEnabled } = require('./lib/ck-config-utils.cjs');

  // Early exit if disabled in config
  if (!isHookEnabled('my-hook')) {
    process.exit(0);
  }

  try {
    // Read JSON from stdin
    const input = JSON.parse(require('fs').readFileSync(0, 'utf-8'));

    // Make a decision
    if (shouldBlock(input)) {
      console.log(JSON.stringify({
        decision: "block",
        reason: "Specific reason here"
      }));
      process.exit(0);
    }

    // Or inject context
    console.log(JSON.stringify({
      hookSpecificOutput: {
        hookEventName: "PreToolUse",
        permissionDecision: "allow",
        additionalContext: "Reminder text"
      }
    }));

    process.exit(0);
  } catch (error) {
    console.error('WARN: Hook error, allowing operation -', error.message);
    process.exit(0);
  }
} catch (e) {
  // Outer crash wrapper — minimal deps, logs to file, exits 0
  const fs = require('fs');
  const p = require('path');
  const logDir = p.join(__dirname, '.logs');
  if (!fs.existsSync(logDir)) fs.mkdirSync(logDir, { recursive: true });
  fs.appendFileSync(p.join(logDir, 'hook-log.jsonl'),
    JSON.stringify({ ts: new Date().toISOString(), hook: __filename, status: 'crash', error: e.message }) + '\n');
  process.exit(0);
}

Key elements:

  1. Crash wrapper — outer try/catch. Always exit 0.
  2. Disabled checkisHookEnabled('hook-name') reads .ckconfig.json. Users can disable individual hooks.
  3. Inner try/catch — handles expected errors gracefully.
  4. JSON I/O — stdin → process → stdout.
  5. Crash log — minimal deps (just Node builtins). Writes to a JSONL file for auditing.

Anti-patterns

  1. Hooks that block on success. Hook returns {"decision": "block"} even when nothing’s wrong. Annoying.
  2. Hooks that fail closed. Bug in the hook → session breaks. Should fail open.
  3. Hooks that write to stdout for non-JSON output. Pollutes the hook protocol. Use stderr for logs.
  4. Hooks with no timeout. A hung hook hangs the session. Always set a timeout.
  5. Hooks that depend on external state. Network, filesystem locks, other processes. Brittle.
  6. Hooks in critical paths with high latency. A 5-second hook on every PreToolUse kills UX.
  7. Hooks doing model calls. Adds latency, cost, non-determinism. Use a skill for that.
  8. No hook log. Crashes are invisible. Bug reports are mysteries.
  9. Hooks with broad matchers. matcher: "*" runs on every event. Use narrow matchers.
  10. Prompt-style hooks. Hook’s output is a 200-word “Please consider X…” essay. Hooks should be terse.
  11. Hooks that ignore the disabled flag. No way to turn off a single hook for debugging.

Takeaways for harness engineering

  1. Hooks > prompts for invariants. If it MUST happen, hook it. If it should happen, prompt it.
  2. Fail open. Always. Crash → log → exit 0. The session continues.
  3. Crash log to JSONL. Auditable, line-delimited, easy to grep.
  4. Per-hook disable flag. isHookEnabled('hook-name') reads .ckconfig.json. Users can disable individual hooks for debugging.
  5. Narrow matchers. Don’t run on every tool call when you only need Edit/Write.
  6. Set timeouts. Default 30s; tune per hook. Hung hooks hang the session.
  7. Skip hooks during error recovery. Hooks add tokens; recovery often needs fewer.
  8. Hook output is structured JSON. Stdout = decision; stderr = logs.
  9. Hooks chain in declaration order. Plan for ordering; don’t assume parallelism.
  10. Log crash location at session start. Surface the log path so users know where to look.
  11. Hooks share state via files, not memory. Each hook is a fresh process. Use /tmp or a state dir.

What this repo does

This repo has 17 hooks across 8 lifecycle events. Full inventory in the main README.md. Categorized:

Context injection (additive — never block)

  • session-init.cjs (SessionStart) — detect project type, persist to env vars
  • inject-rules.cjs (SessionStart) — load rules/*.md into context
  • subagent-init.cjs (SubagentStart) — minimal context for subagents (~200 tokens)
  • token-efficiency-reminder.cjs (UserPromptSubmit) — inject token discipline
  • dev-rules-reminder.cjs (UserPromptSubmit) — re-inject rules + Plan Context
  • usage-context-awareness.cjs (UserPromptSubmit + PostToolUse) — track API usage
  • descriptive-name.cjs (PreToolUse:Write) — file-naming guidance
  • enforce-doc-rules.cjs (PreToolUse:Edit/Write/MultiEdit on docs/*.md) — doc convention reminder

Defensive blocking

  • scout-block.cjs (PreToolUse: file ops) — block reads from .ckignore paths
  • privacy-block.cjs (PreToolUse: file ops) — block sensitive files unless APPROVED: prefix
  • guard-task.cjs (PreToolUse:Task) — force approval on subagent spawning

Loop detection / verification

  • loop-detection.cjs (PreToolUse:Edit/Write/MultiEdit) — track per-file edit counts in /tmp, warn after threshold
  • stop-verify.cjs (Stop) — pre-completion lint + intent-vs-diff check; uses stop_hook_active circuit breaker

Build feedback

  • build-sensor.cjs (PostToolUse:Edit/Write/MultiEdit) — auto-run project build, surface only failures

Cleanup

  • session-cleanup.cjs (SessionEnd) — clean up /tmp state from loop-detection.cjs

Patterns this repo demonstrates

  1. Crash wrapper everywhere. Every .cjs hook has the outer try/catch + JSONL log.
  2. isHookEnabled check. Users can disable individual hooks via .ckconfig.json.
  3. Lib reuse. Common logic in hooks/lib/ (project-detector, context-builder, scout-checker, privacy-checker, etc.). Hooks are thin wrappers around library functions.
  4. Env var passing. session-init.cjs detects expensive properties once and writes them to env vars; downstream hooks read the env vars instead of recomputing.
  5. Consistent language. .cjs (CommonJS Node) for all hooks — consistent runtime, shared utilities via require().
  6. Narrow matchers. Each hook’s matcher targets exactly the tools it needs.
  7. Timeouts on slow hooks. stop-verify.cjs is 45s, usage-context-awareness.cjs is 30s. Defaults are overridden where measured needs require it.

Open problems

  • No hook telemetry beyond crash logs. We don’t know which hooks fire most, which add the most latency, which inject the most tokens. A weekly audit would be valuable — maybe a harness-tune invocation.
  • No automatic hook crash surfacing. hooks/.logs/hook-log.jsonl exists but isn’t checked at session start. Bugs lurk silently.
  • No hook ordering enforcement. Multiple hooks on the same event run in declaration order, but the order isn’t documented per event.
  • No hook test framework. Each hook is hand-tested. A scripted test runner that pipes mock JSON to each hook and asserts on output would catch regressions.
  • Hook scripts are CommonJS only. No ESM. Mostly fine, but limits ergonomics for newer Node features.
  • No PreCompact hook. The distribution doesn’t hook into context compaction events yet. A PreCompact hook could enforce “extract before compact” at the harness level.