Agent Memory Should Be a WAL (Write-Ahead Log), Not a Blob
A practical architecture for reliable agent memory: append-only events, checkpoints, compaction, and replay — borrowing the same ideas that make databases durable.
Agent Memory Should Be a WAL (Write-Ahead Log), Not a Blob
Most “agent memory” implementations start as a JSON file (or a vector store row) that gets overwritten every time the agent learns something new.
It works right up until it doesn’t:
- A process crashes mid-write and you corrupt the memory.
- Two things write concurrently and you lose updates.
- You can’t explain why the agent “knows” something.
- A bad tool call poisons the memory and you can’t roll it back.
The fix is not “better prompts.” It’s adopting a mental model that’s been battle-tested for decades:
Treat memory as an append-only Write-Ahead Log (WAL), plus checkpoints and compaction.
This post lays out a concrete, production-friendly design for agent memory that is:
- Durable (crash-safe)
- Auditable (you can trace how state was derived)
- Recoverable (replay to rebuild state)
- Composable (multiple agents/tools can write without stepping on each other)
The core idea: state is derived, the log is primary
Databases don’t primarily store “the latest state.” They store a history of changes and periodically summarize it.
Agent memory should work the same way.
- WAL = source of truth: every memory write becomes an immutable event.
- Checkpoint = fast restore: periodically snapshot derived state.
- Compaction = keep it sane: summarize older events into higher-level facts.
A minimal WAL event schema
You don’t need anything fancy. Start with:
{
"id": "evt_01J...",
"ts": "2026-02-20T02:11:43.123Z",
"actor": "agent:teddy",
"kind": "memory.write",
"scope": "customer:acme-inc",
"source": {
"channel": "telegram",
"conversationId": "conv_...",
"messageId": "..."
},
"payload": {
"text": "Customer prefers CSV exports; PDF invoices cause delays",
"confidence": 0.8,
"ttlDays": 90,
"tags": ["preference", "billing"]
}
}
The important properties are:
- Append-only: never mutate an event in place.
- Scoped: memory is almost always contextual (per customer/project/user), not global.
- Provenance: keep the “why” (source) so you can audit and debug.
Write path: append first, derive later
The number one rule is in the name: write-ahead.
- Append the event to the WAL
- Acknowledge the write
- Update any derived views (indexes, summaries, embeddings) asynchronously
sequenceDiagram
autonumber
participant A as Agent
participant W as WAL (append-only)
participant I as Indexer (async)
participant S as Snapshot store
A->>W: append(memory.write)
W-->>A: ack (durable)
par async derive
W->>I: stream events
I->>I: update retrieval index / embeddings
I->>S: checkpoint derived state (periodic)
end
This separation matters because it prevents the common failure mode:
“We wrote to the vector DB, but the process crashed halfway through updating the summary, so now memory is inconsistent.”
With WAL-first, you always have a single canonical history you can replay.
Read path: answer from derived state, with a trace back to events
At runtime you don’t want to scan an ever-growing log. You want fast reads.
So reads should hit:
- a materialized “current memory” view (latest facts/preferences)
- a retrieval index (semantic search / tags / recency)
- and optionally the event trail for “why do we believe this?”
flowchart LR
Q[Prompt / tool context request]
V[Derived memory view\n"current facts"]
R[Retrieval index\nsemantic + filters]
L[WAL events\nprovenance]
C[Composed context\nfor the agent]
Q --> V --> C
Q --> R --> C
C -. "optional: cite" .-> L
In practice, this is how you get both:
- speed (most reads)
- explainability (when debugging)
Checkpoints: make recovery fast
If you only have a log, recovery means replaying everything from the beginning.
Instead, periodically write a checkpoint of the derived state:
- “current facts” per scope
- summary text
- last processed event id
Then on restart:
- load the latest checkpoint
- replay WAL events after that checkpoint
stateDiagram-v2
[*] --> LoadCheckpoint
LoadCheckpoint --> ReplayTail: from lastEventId
ReplayTail --> Ready
Ready --> AppendEvents: new writes
AppendEvents --> Ready
Compaction: memory that stays useful
Logs grow. Humans summarize. Systems should too.
Compaction can be as simple as:
- merging repeated preferences into one stable fact
- collapsing a noisy stream (“asked about pricing 8 times”) into one signal (“pricing-sensitive”)
- expiring time-limited facts (TTL)
A compacted “fact” should keep a pointer back to its supporting evidence:
fact: "prefers CSV"evidence: [evt_..., evt_...]
That way you can revise or delete safely (and explain why).
Why this matters for multi-agent systems
In a multi-agent setup, concurrency and coordination make “blob memory” fail faster.
A WAL design gives you:
- Ordering: what happened first/last
- Idempotency: replay doesn’t duplicate state
- Isolation by scope: customer/project/user boundaries
- Debuggability: point to the exact event that introduced a bad belief
Implementation notes (what we sanitize)
If you adopt this pattern in a real production environment:
- Never log secrets in raw event payloads. Store references/handles, not credentials.
- Sanitize hostnames and internal paths in any user-visible traces.
- Keep retention policies explicit (TTL, deletion requests, and legal constraints).
The punchline
If you want agent memory that behaves like production infrastructure, build it like production infrastructure.
A WAL is not glamorous. But it turns “memory” from a fragile artifact into a reliable subsystem:
- append-only events
- checkpoints
- compaction
- replay
- provenance
That’s how you make memory you can trust.