Workflow

7 min read 1389 words

Table of Contents

Workflow

Workflow

Workflow is the shape of agent execution: the phases, the permissions per phase, the transitions between them, and the control structure of the loop itself.

This is the scaffolding the agent runs inside. Tools and memory are the verbs and nouns; workflow is the grammar.

The core problem

Agents that jump straight to editing make changes based on incomplete understanding. They edit the wrong files, miss dependencies, and “fix” symptoms rather than causes. The symmetric failure is agents that plan forever and never act — paralyzed by exploration.

The workflow must force a reading phase before a writing phase, a planning phase before an acting phase, and a verification phase before completion. Each phase has different capabilities. Phase transitions are permission boundaries.

Patterns observed in Claude Code

Explore-plan-act loop

Three-phase workflow with increasing write permissions:

Phase	Tools	Goal
Explore	Read, Grep, Glob (read-only)	Map the problem. Load context. Understand what exists.
Plan	Read + discussion (no writes)	Propose an approach. Get user alignment. Identify risks.
Act	Full tool access	Execute the plan. Write code. Run tests.

The system prompt steers the agent away from editing before understanding. Plan mode is a workflow phase that is ALSO a permission mode — the agent physically cannot Write while in plan.

Why it matters: agents jumping straight to editing miss dependencies. The exploration phase is not optional lazy research — it’s a required pre-condition for any change that’s not a one-liner.

Trade-off: additional turns slow progress for trivial tasks. Claude Code mitigates this with “safe tool allowlist” fast paths — trivial reads bypass the plan gate.

Coordinator restriction

The coordinator agent is banned from execution tools. It only has 4 tools:

TeamCreate
TeamDelete
SendMessage
SyntheticOutput

No Bash, no Read, no Edit. The coordinator can only coordinate — it cannot do the work directly.

Why: if the coordinator has Edit, it will edit directly instead of delegating. LLMs take the shortest path. Removing the capability forces delegation, which creates natural parallelism and specialization.

Enforced by code (tool list), not by instruction (prompt). Defense-in-depth: both the prompt tells the coordinator to delegate AND the tool list prevents it from doing otherwise.

This is the single most important insight in multi-agent workflow design: remove capabilities rather than asking agents not to use them. Asking doesn’t scale. Removing does.

See multi-agent.md for the coordinator pattern in depth.

Workflow phases as permission modes

Claude Code has 7 permission modes:

default — normal operation, approve risky actions
plan — read-only, no writes
auto — low-risk actions auto-approved, high-risk still gated
bypass — skip approval for approved tools (not dangerous patterns)
yolo — full autonomy (dangerous)
readonly — strict read-only
workspace-write — write only inside workspace

Each mode is a workflow phase wearing a permission hat. The phase decides which tools the agent has; the permission system enforces it.

Modes can be composed with phases:

Explore = plan mode (phase) + readonly (permission)
Plan = plan mode + minimal tools
Act = default mode (phase) + whatever the user approved for this session

Main query loop

The agent’s entire control flow lives in an async function* running while(true). Each iteration has 4 phases:

Context Assembly — build the system prompt (static + dynamic boundary + dynamic sections), assemble conversation history, apply compaction if needed
Stream API Call — send to the model, stream tokens back
Tool Execution — for each tool_use block, run the tool (possibly in parallel), inject results as user messages
Stop or Continue — derived flag: if any tool_use blocks exist → continue; else → stop

See patterns.md for the async generator mechanics. The key workflow insight is: the loop is the workflow. Everything else hangs off it.

Derived continuation flag

The agent does NOT trust stop_reason === 'tool_use' from the API. Quote from the leaked code:

stop_reason === 'tool_use' is unreliable — it’s not always set correctly.

Instead, it observes actual content: if any tool_use blocks exist in the assistant’s response, needsFollowUp = true.

Lesson for workflow design: derive control flow from content, not metadata. Metadata can lie; content cannot. This generalizes: any control signal should be derivable from observable state, not from upstream flags.

Workflow composition

Claude Code’s workflows are built by composing primitive phases:

Phase	Enforcement	Example
Explore	Tool subset (Read/Grep/Glob only)	`/plan` mode
Plan	Tool subset + prompt (“discuss before acting”)	`/plan` output
Act	Default tools	Default mode
Verify	Dedicated subagent (“Verification agent”)	After `act` phase
Review	Tool subset + prompt (“find problems”)	`/review` skill
Ship	Linear pipeline (test → bump → commit → push)	`/ship` skill

Skills in this repo compose phases:

decompose = explore + plan (for feature specs)
plan-*-review = plan + verify
qa = act + verify loop
ship = verify → act → observe → act (pipeline)
review = explore + verify (no action)
investigate = explore (iron law: no act without root cause)

The composition lives in the skill’s SKILL.md; the enforcement lives in the tool list + permission mode + prompt.

Anti-patterns

Single-phase workflows. “Just do the thing.” No planning, no verification, no phase transitions. Works for one-line tasks, fails for everything else.
Phase restriction by prompt only. The prompt says “don’t edit yet” but the agent has Edit in its tool list. It edits anyway. LLMs take the shortest path.
Permanent exploration phase. Plan mode with no exit criterion. Agent explores forever, never commits.
No verify phase. Agent declares “done” without running tests, checking lint, or reading the diff. “Verification avoidance” — see multi-agent.md on the Verification agent.
Coordinator with execution tools. Coordinator edits code directly, bypassing its workers. Parallelism collapses to sequential.
Metadata-driven control flow. Using stop_reason or turn counts to decide continuation. Fragile — the API changes, the metadata drifts.
No main loop. Agent runs one turn and exits. Can’t handle follow-ups, can’t recover from errors, can’t iterate.

Takeaways for harness engineering

Separate phases by permission, not by prompt. Remove capabilities, don’t ask the agent nicely.
Explore-plan-act is the default scaffold. Deviations need justification, not the other way around.
Derive control flow from content. Never trust upstream metadata to drive the loop.
One main loop, many phases. The loop is the workflow. Phases are state transitions inside it.
Coordinator ≠ worker. If you have multi-agent, the coordinator has no execution tools. This is the single highest-leverage rule in multi-agent design.
Phase transitions are logged. You need to know which phase was active when a thing went wrong. Observability lives at the phase boundary.
Fast-path trivial tasks. Safe-tool allowlist bypasses the plan gate. Otherwise plan-mode becomes user-hostile for one-line questions.
Every phase has an exit criterion. “Plan mode ends when the user approves the plan or types /act.” Without exit criteria, phases become black holes.

What this repo does

rules/primary-workflow.md — auto-loaded rule describing the plan → implement → test → review pipeline. Every session starts with this in context.
skills/plan-ceo-review/, skills/plan-eng-review/, skills/plan-design-review/ — plan-phase skills. Each is a structured review at a different layer (scope, architecture, design).
skills/investigate/ — enforces the iron law: no fixes without root cause. An explore-phase skill that proactively claims bug reports.
skills/verify/ — a verify phase runs before any “done” claim. Pre-completion checklist.
skills/ship/ — a linear pipeline phase (test → version → changelog → PR). Zero discretion once started.
hooks/stop-verify.cjs — enforces the verify phase at the harness level, not just the skill level. Stop hook checks lint and intent-vs-diff before letting the agent finish.
hooks/guard-task.cjs — forces approval before spawning subagents. Prevents accidental multi-agent fan-out from within a single-phase workflow.
hooks/loop-detection.cjs — detects when the agent is stuck iterating on the same file (doom loop) and injects a nudge to step back. A workflow-level intervention when the agent loses its phase discipline.

Open problems

Phase transitions are implicit in this repo. The investigate → verify → ship chain isn’t formally modeled — it’s emergent from skill invocations. A formal phase state machine would let hooks observe transitions and enforce invariants.
No fast-path allowlist for trivial tasks. The full rule-reminder pipeline runs on every prompt, even for “what time is it?”. Wasted context.
No exit criteria for plan mode. Agent can loop in plan mode indefinitely. Claude Code handles this at the UI level; this repo doesn’t model it.
Coordinator pattern is not implemented in this repo. If the user moves to multi-agent work, a coordinator skill with restricted tools would be the next addition.

Workflow#

The core problem#

Patterns observed in Claude Code#

Explore-plan-act loop#

Coordinator restriction#

Workflow phases as permission modes#

Main query loop#

Derived continuation flag#

Workflow composition#

Anti-patterns#

Takeaways for harness engineering#

What this repo does#

Open problems#

Workflow

The core problem

Patterns observed in Claude Code

Explore-plan-act loop

Coordinator restriction

Workflow phases as permission modes

Main query loop

Derived continuation flag

Workflow composition

Anti-patterns

Takeaways for harness engineering

What this repo does

Open problems