← Harness Engineering

Memory

Memory

Memory is knowledge that persists across sessions — facts, decisions, lessons, preferences, code conventions — anything the agent should not re-learn every time the user returns.

Memory is different from context (factor: context.md). Context is the live conversation window; memory is the long-term store that seeds new contexts.


The core problem

Without memory, agents repeat the same mistakes across sessions. Users re-explain conventions, re-paste style guides, re-describe the project. The agent’s “intelligence” resets to zero at each session start.

The naive fix — dump everything into the system prompt — hits the context window limit and wastes tokens on irrelevant history. The real problem is selective recall: load what’s relevant to this session, leave the rest on disk.


Patterns observed in Claude Code

Tiered memory

Memory is organized into layers with different loading strategies:

  • Compact index — capped at 200 lines, always in context. Tells the agent what memory exists without loading it all.
  • Topic-specific files — loaded on demand when the index says they’re relevant.
  • Full transcripts — remain on disk; searched only when the topic file points to specific exchanges.

Loading all memory equally wastes tokens, hits size limits, and buries useful information under noise. Tiering lets a 10 MB memory store behave like a 2 KB one at session start.

Trade-off: complexity in deciding what goes where, and keeping the index synchronized when new memories land.

Dream consolidation

Background processes periodically review, deduplicate, prune, and reorganize agent memory during idle time. In the Claude Code leak, this was called autoDream mode, containing:

  • 8 phases of memory management
  • 5 types of context compaction
  • Duplicate merging
  • Contradiction pruning
  • Index tightness maintenance

The Dream background task (one of 7 task types, alongside local_bash, local_agent, remote_agent, etc.) runs between sessions with 4 sub-phases: orient → gather → consolidate → prune.

Why background: consolidation burns tokens, so it runs during idle time, not on the critical path of a user turn.

Risk: the consolidation process can delete information users still needed. No hard retention policy — the agent self-manages, which is a known gap.

Memory extraction during compaction

The critical insight: memory extraction runs alongside context compaction. When the conversation gets compressed, the system also extracts durable lessons from it. The principle:

Forget conversation, remember lessons.

Conversation detail is ephemeral; the lesson learned from the conversation is durable. At compact time:

  • The conversation gets summarized (for context)
  • Key decisions, code patterns, and user preferences get extracted (for memory)

Both happen in the same pipeline — the summarization pass and the extraction pass share the same LLM call with structured output.

Memory scopes

Three scopes, chosen per-memory:

ScopeLifetimeExample
UserGlobal across all projects for this user“I prefer TypeScript strict mode”
ProjectPer-directory — only loaded when CWD is in this project“This repo uses Bun, not Node”
LocalPer-session — cleared at session end“Just tried approach X, it failed”

The scope decides where the memory is relevant. A project-scoped memory never pollutes other project contexts.

Team memory

Multiple agents on the same team (e.g., a pair-programming coordinator + 3 workers) share a memory space. Agent A extracts a memory mid-session; Agent B reads it in its next session. Shared organizational knowledge across agent boundaries.

Security implication: team memory must scope to the team, not leak across teams. A permission boundary lives inside the memory store.

Memory decay

memoryAge.ts — older memories get lower relevance scores. Recent memories win ties in retrieval. This simulates human memory: we don’t forget facts outright, but we deprioritize old ones in favor of recent ones.

Implication: without decay, a 2-year-old “use jQuery” memory outranks a 2-week-old “we migrated to React” memory, and the agent teaches outdated patterns.


Lifecycle

Session runs → tool calls accumulate → session ends
        ↓
extractMemories (background, non-blocking)
        ↓
written to scoped store (user | project | local)
        ↓
new session starts
        ↓
memoryScan + findRelevantMemories
        ↓
loadMemoryPrompt → injected into system prompt
        ↓
agent has context from day one

Key properties:

  • Extraction is non-blocking — the session ends immediately; extraction happens in the background.
  • Loading is filteredfindRelevantMemories scores each memory against the new session’s initial context and loads only the top-k.
  • Injection is at prompt build time — memories land in the system prompt, not user turns, so they don’t compete with user input for turn-level attention.

Anti-patterns

  1. Dumping all memory into the system prompt. Fills the window with noise. Costs tokens on every turn. Hits limits fast.
  2. No scope boundaries. Cross-project memory bleed — “use TailwindCSS” memory from project A surfaces in project B that uses Bootstrap.
  3. No extraction during compaction. The conversation is summarized and lost; nothing persists for the next session. You lose the entire learning.
  4. No decay. Outdated memories outrank fresh ones.
  5. Synchronous consolidation on the critical path. User turn blocks while the agent prunes its memory store.
  6. Letting the agent self-extract into a prompt blob. The agent writes freeform “notes” that grow unbounded — no structure, no retrieval, no scoping.

Takeaways for harness engineering

  1. Separate conversation memory from knowledge memory. Conversation = within session. Knowledge = across sessions. Different stores, different lifecycles, different loading strategies.
  2. Tier the store. Index in context always; topic files on demand; full transcripts on disk. The tier is chosen by token cost, not by content.
  3. Extract at compact time. The summarization pass is already paying for an LLM call — extract lessons in the same call. One API invocation, two artifacts.
  4. Scope every memory. User / project / local. No memory should be globally visible without a reason.
  5. Decay by default. Score memories by recency; the retrieval step breaks ties with freshness.
  6. Run consolidation in the background. Never on the user’s critical path. Treat it like garbage collection.
  7. Share memory across agents in the same team — they’re solving the same problem together. But hard-scope team memory at the permission boundary.

What this repo does

This distribution relies on Claude Code’s built-in auto memory system — file-based, at ~/.claude/projects/.../memory/, with an index file (MEMORY.md) capped at 200 lines. Topic-specific memory files load on demand. The rules in rules/ serve as the “persistent instruction file” pattern — auto-loaded every session, project-scoped.

Gap in this repo: no decay scoring. Claude Code’s built-in memory doesn’t have explicit recency weighting. No dream consolidation. No memory audit interface.


Open problems

  • Retention policy. Claude Code’s Dream task prunes between sessions, but there’s no explicit retention rule (“delete memories older than 6 months unless accessed”). Who decides what gets pruned?
  • Memory conflict resolution. Two memories contradict — “use Postgres” and “we migrated to MongoDB”. Which wins? Decay helps; explicit versioning would help more.
  • Memory audit. How does the user see what the agent remembers? A /memory list / /memory forget interface is standard but rarely built.
  • Multi-tenant isolation. Single-user design assumed everywhere. Memory stores leak across users unless explicitly namespaced — open problem for platforms.