Agent from Scratch Part 4: Memory

Memory = Context Window Expansion#

In agent design, “memory” is a word that sounds complicated but maps to something concrete: getting information into the context window that wouldn’t otherwise fit.

A 256k context window sounds large until you try to fit the world’s knowledge into it. More importantly, you shouldn’t try — attention spreads thin over large contexts. Finding a tiny detail inside a massive blob of text is like finding a footnote inside an encyclopedia. The model can do it, but not reliably.

The Early Problem#

Early LLMs had tiny context windows — 64k tokens was generous. With system prompt, guidance, and conversation history taking up space, you had almost nothing left for the actual knowledge that helps the model produce good answers.

This forced a key design decision: most knowledge lives outside the context window. You retrieve what you need, when you need it.

RAG (Retrieval-Augmented Generation) is the canonical solution. So is the knowledge graph. Both are just strategies for deciding what to pull in and when.

The File System Mental Model#

Here’s the simplest way to think about agent memory: it’s a text file.

Think about what you can do with a text file in an OS:

Read — load it into context when relevant
Append — add new information without overwriting
Overwrite — replace when the old content is no longer valid
Concat — merge multiple memory sources

That’s the complete set of memory operations you need. No magic required.

Triggers and Lifecycle#

Every memory system needs a trigger — some event that causes the agent to create, update, or read memory.

In my implementation, the trigger is simple: end of session. After each session, the agent:

Scans existing memories for relevance
Summarizes what happened: key actions taken, what worked, what failed
Writes a new memory entry or updates an existing one

This creates a persistent record of agent experience. Over multiple sessions, the agent builds up a structured history of its own performance.

Memory as Training Data#

Here’s where it gets interesting.

The summaries your agent writes are exactly the kind of data you’d want for fine-tuning. Key decisions made, correct choices, wrong turns, recovery patterns — this is behavioral signal in a clean format.

If you can afford to fine-tune (or when fine-tuning costs drop further), the memory log from a well-designed agent becomes a natural training dataset. Your model starts embodying the patterns of whoever built the agent.

Evaluation Closes the Loop#

In production, you don’t just write memories blindly. You run an evaluation pass after each session:

Did the agent achieve the goal?
Were the actions efficient?
Were any tools misused?

Only memories that pass evaluation get committed to long-term storage. Bad runs get flagged for review, not reinforced.

This is the same loop that makes humans better at their jobs: do, reflect, evaluate, adjust.

The Full Picture#

PRTCL // PLAINTEXT

1
Session starts
2
  → Load relevant memories into context
3
  → Agent executes task using skills + context
4

5
Session ends
6
  → Summarize what happened
7
  → Evaluate quality
8
  → Write/update memory
9
  → (Optional) Flag data for fine-tuning

Simple. No exotic architecture required. The complexity is in the evaluation step — deciding what counts as a good run is the hardest part.

GitHub#

Implementation is at github.com/Czhang0727. The memory system is the simplest module in the repo — a reminder that the best designs usually are.

The next post covers Hermes — a real-world agent hitting the limits of this design and what I built to fix it.

Agent from Scratch Part 4: Memory

Memory = Context Window Expansion#

The Early Problem#

The File System Mental Model#

Triggers and Lifecycle#

Memory as Training Data#

Evaluation Closes the Loop#

The Full Picture#

GitHub#

Agent from Scratch

Agent from Scratch Part 4: Memory

Agent from Scratch Part 3: Skills

Agent from Scratch Part 1: IO

Agent from Scratch Part 0: What Is an Agent?

Agent from Scratch Part 3: Skills

05 // Transmission_Log_Capture