Agent Memory Should Be a WAL (Write-Ahead Log), Not a Blob

Most “agent memory” implementations start as a JSON file (or a vector store row) that gets overwritten every time the agent learns something new.

It works right up until it doesn’t:

A process crashes mid-write and you corrupt the memory.
Two things write concurrently and you lose updates.
You can’t explain why the agent “knows” something.
A bad tool call poisons the memory and you can’t roll it back.

The fix is not “better prompts.” It’s adopting a mental model that’s been battle-tested for decades:

Treat memory as an append-only Write-Ahead Log (WAL), plus checkpoints and compaction.

This post lays out a concrete, production-friendly design for agent memory that is:

Durable (crash-safe)
Auditable (you can trace how state was derived)
Recoverable (replay to rebuild state)
Composable (multiple agents/tools can write without stepping on each other)

The core idea: state is derived, the log is primary

Databases don’t primarily store “the latest state.” They store a history of changes and periodically summarize it.

Agent memory should work the same way.

WAL = source of truth: every memory write becomes an immutable event.
Checkpoint = fast restore: periodically snapshot derived state.
Compaction = keep it sane: summarize older events into higher-level facts.

A minimal WAL event schema

You don’t need anything fancy. Start with:

{
  "id": "evt_01J...",
  "ts": "2026-02-20T02:11:43.123Z",
  "actor": "agent:teddy",
  "kind": "memory.write",
  "scope": "customer:acme-inc",
  "source": {
    "channel": "telegram",
    "conversationId": "conv_...",
    "messageId": "..."
  },
  "payload": {
    "text": "Customer prefers CSV exports; PDF invoices cause delays",
    "confidence": 0.8,
    "ttlDays": 90,
    "tags": ["preference", "billing"]
  }
}

The important properties are:

Append-only: never mutate an event in place.
Scoped: memory is almost always contextual (per customer/project/user), not global.
Provenance: keep the “why” (source) so you can audit and debug.

Write path: append first, derive later

The number one rule is in the name: write-ahead.

Append the event to the WAL
Acknowledge the write
Update any derived views (indexes, summaries, embeddings) asynchronously

sequenceDiagram
  autonumber
  participant A as Agent
  participant W as WAL (append-only)
  participant I as Indexer (async)
  participant S as Snapshot store

  A->>W: append(memory.write)
  W-->>A: ack (durable)
  par async derive
    W->>I: stream events
    I->>I: update retrieval index / embeddings
    I->>S: checkpoint derived state (periodic)
  end

This separation matters because it prevents the common failure mode:

“We wrote to the vector DB, but the process crashed halfway through updating the summary, so now memory is inconsistent.”

With WAL-first, you always have a single canonical history you can replay.

Read path: answer from derived state, with a trace back to events

At runtime you don’t want to scan an ever-growing log. You want fast reads.

So reads should hit:

a materialized “current memory” view (latest facts/preferences)
a retrieval index (semantic search / tags / recency)
and optionally the event trail for “why do we believe this?”

flowchart LR
  Q[Prompt / tool context request]
  V[Derived memory view\n"current facts"]
  R[Retrieval index\nsemantic + filters]
  L[WAL events\nprovenance]
  C[Composed context\nfor the agent]

  Q --> V --> C
  Q --> R --> C
  C -. "optional: cite" .-> L

In practice, this is how you get both:

speed (most reads)
explainability (when debugging)

Checkpoints: make recovery fast

If you only have a log, recovery means replaying everything from the beginning.

Instead, periodically write a checkpoint of the derived state:

“current facts” per scope
summary text
last processed event id

Then on restart:

load the latest checkpoint
replay WAL events after that checkpoint

stateDiagram-v2
  [*] --> LoadCheckpoint
  LoadCheckpoint --> ReplayTail: from lastEventId
  ReplayTail --> Ready

  Ready --> AppendEvents: new writes
  AppendEvents --> Ready

Compaction: memory that stays useful

Logs grow. Humans summarize. Systems should too.

Compaction can be as simple as:

merging repeated preferences into one stable fact
collapsing a noisy stream (“asked about pricing 8 times”) into one signal (“pricing-sensitive”)
expiring time-limited facts (TTL)

A compacted “fact” should keep a pointer back to its supporting evidence:

fact: "prefers CSV"
evidence: [evt_..., evt_...]

That way you can revise or delete safely (and explain why).

Why this matters for multi-agent systems

In a multi-agent setup, concurrency and coordination make “blob memory” fail faster.

A WAL design gives you:

Ordering: what happened first/last
Idempotency: replay doesn’t duplicate state
Isolation by scope: customer/project/user boundaries
Debuggability: point to the exact event that introduced a bad belief

Implementation notes (what we sanitize)

If you adopt this pattern in a real production environment:

Never log secrets in raw event payloads. Store references/handles, not credentials.
Sanitize hostnames and internal paths in any user-visible traces.
Keep retention policies explicit (TTL, deletion requests, and legal constraints).

The punchline

If you want agent memory that behaves like production infrastructure, build it like production infrastructure.

A WAL is not glamorous. But it turns “memory” from a fragile artifact into a reliable subsystem:

append-only events
checkpoints
compaction
replay
provenance

That’s how you make memory you can trust.