<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet href="/rss-style.xsl" type="text/xsl"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/"><channel><title>Chenyi&apos;s Blog</title><description>Thoughts on AI, Engineering &amp; Life</description><link>https://blog.chenyi.ai/</link><language>en</language><atom:link href="https://blog.chenyi.ai/rss.xml" rel="self" type="application/rss+xml"/><lastBuildDate>Sun, 07 Jun 2026 05:31:38 GMT</lastBuildDate><generator>Astro RSS</generator><item><title>Hermes Agent: When Your Agent Has Too Many Skills</title><link>https://blog.chenyi.ai/posts/hermes-skill-management/</link><guid isPermaLink="true">https://blog.chenyi.ai/posts/hermes-skill-management/</guid><description>Real-world lesson from running Hermes at scale: too many skills makes your agent go nuts. Here&apos;s the system I built to fix it.</description><pubDate>Sat, 06 Jun 2026 00:00:00 GMT</pubDate><content:encoded>&lt;img src=&quot;https://blog.chenyi.ai/images/hermes-skill-management-diagram.png&quot; alt=&quot;Hermes Agent: When Your Agent Has Too Many Skills&quot; style=&quot;border-radius: 1rem; margin-bottom: 1rem; max-width: 100%; height: auto;&quot; /&gt;&lt;h2&gt;The Problem Nobody Talks About&lt;/h2&gt;
&lt;p&gt;When any agent has too many skills — and by &quot;too many&quot; I mean past some fuzzy threshold that depends on skill complexity and overlap — &lt;strong&gt;the agent will eventually go nuts&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Here&apos;s a concrete example. I was running Hermes and had both a Gmail skill and a Google Workspace skill. They overlapped. At some point, the Gmail skill&apos;s API went out of date. Every time the agent called it:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;&quot;Sorry, API failed. Let me directly fetch the web... oh, I don&apos;t have access. Let me rethink... actually, wait, you have another skill that might work...&quot;&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Burning tokens. Spinning in circles. Not working.&lt;/p&gt;
&lt;p&gt;The obvious fix — manually review and clean up the skills — doesn&apos;t scale. Skills aren&apos;t static. They&apos;re more like repos: they need to be maintained. APIs change, tools break, better patterns emerge.&lt;/p&gt;
&lt;p&gt;I needed AI to maintain the skills, not me.&lt;/p&gt;
&lt;h2&gt;The Architecture&lt;/h2&gt;
&lt;p&gt;Here&apos;s what I built (Codex and Claude Code wrote 100% of it, I just described the flows 😅):&lt;/p&gt;
&lt;p&gt;Here&apos;s the full flow:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/images/hermes-skill-management-diagram.png&quot; alt=&quot;Hermes Skill Management architecture — Skills Manager at center, fed by memory components (Use Memory, Local Skill Storage/RAGized, History COT→value) and CronSignal. Outputs to Install Reduced Skills or Rewrite Skills, both feeding into MAGIC EVALUATION FOLKS!!, which triggers actual skill update and index reload.&quot; /&gt;&lt;/p&gt;
&lt;p&gt;And the agent delegation model that makes it work:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/images/hermes-agent-architecture.png&quot; alt=&quot;Main Agent architecture — Main Agent coordinates with Plan, then delegates to Sub-Agent, which receives only the skills it needs from Skill Manager&quot; /&gt;&lt;/p&gt;
&lt;h3&gt;1. Skill Manager with RAG-based Deduplication&lt;/h3&gt;
&lt;p&gt;Before installing any new skill, run it through a skill manager that checks for redundancy.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Match by keyword and semantic embedding&lt;/li&gt;
&lt;li&gt;Also use tags: &quot;google&quot;, &quot;productivity&quot;, &quot;stock trading&quot; — not just embedding similarity&lt;/li&gt;
&lt;li&gt;At ~5k skills, this runs fast enough to be practical&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If a new skill is too similar to an existing one, reject it or merge the concepts.&lt;/p&gt;
&lt;h3&gt;2. Telemetry System&lt;/h3&gt;
&lt;p&gt;Every skill call gets logged:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Success or failure&lt;/li&gt;
&lt;li&gt;Chain-of-thought trace&lt;/li&gt;
&lt;li&gt;Token cost&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Stored in local SQL + blob storage. This is the data layer that makes everything else possible.&lt;/p&gt;
&lt;h3&gt;3. Installation Filter&lt;/h3&gt;
&lt;p&gt;On every new extension/plugin/skill install, the skill manager runs first. The filter compares the new skill against existing ones and reduces overlap before it lands in the system.&lt;/p&gt;
&lt;h3&gt;4. Weekly Cron Audit&lt;/h3&gt;
&lt;p&gt;A cron job (I haven&apos;t tuned the trigger yet — tell me a better one) does a delta review of the telemetry logs:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Find skills with high failure rates or bloated COT&lt;/li&gt;
&lt;li&gt;Decide: modify the skill to be more efficient, or delete it&lt;/li&gt;
&lt;li&gt;If deleting, use web search to find or create a replacement&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Critical:&lt;/strong&gt; make sure your eval environment is stable before running this. You don&apos;t want the audit job to delete a working skill because it was measured during a bad network day.&lt;/p&gt;
&lt;h3&gt;5. Main Agent Restructure&lt;/h3&gt;
&lt;p&gt;The main agent no longer holds specific skills directly. Instead:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Main agent receives a task&lt;/li&gt;
&lt;li&gt;Spawns a sub-agent&lt;/li&gt;
&lt;li&gt;Sub-agent calls the skill manager to install what it needs&lt;/li&gt;
&lt;li&gt;Sub-agent executes&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The main agent&apos;s only job is &lt;strong&gt;managing other agents and planning&lt;/strong&gt;. I&apos;m still thinking about whether planning and execution should be split further — probably over-engineering at this stage.&lt;/p&gt;
&lt;h2&gt;Why This Changes the Design Fundamentally&lt;/h2&gt;
&lt;p&gt;The skill management system forces a question I hadn&apos;t thought about clearly before: &lt;strong&gt;what should the main agent be good at?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;My answer: not much, specifically. The main agent should be good at delegation and planning. Everything else — tool use, skill selection, domain expertise — gets handled by specialized sub-agents that spin up with exactly the skills they need.&lt;/p&gt;
&lt;p&gt;This is closer to how real teams work. A good manager doesn&apos;t know how to do every job on the team. They know who to call and what to ask for.&lt;/p&gt;
&lt;h2&gt;What&apos;s Left (TODO)&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Better trigger for the audit cron (weekly is arbitrary)&lt;/li&gt;
&lt;li&gt;Web search integration for auto-replacing deleted skills&lt;/li&gt;
&lt;li&gt;Eval environment stability before running automated cleanup&lt;/li&gt;
&lt;li&gt;Split planning into a separate agent (maybe)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;On the Birth Announcement&lt;/h2&gt;
&lt;p&gt;Yes, I buried the lede. My first baby Grace was just born. Between her and a TOP urgent work task, the Agent from Scratch series is delayed. But I&apos;m not giving up on it. Q.Q&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;em&gt;This is the Hermes architecture as of June 2026. The code is on &lt;a href=&quot;https://github.com/Czhang0727&quot;&gt;GitHub&lt;/a&gt; — Claude Code wrote it, I just had the ideas.&lt;/em&gt;&lt;/p&gt;
</content:encoded><category>Agent from Scratch</category><category>AI Agents</category><category>Hermes</category><category>Skill Management</category><category>RAG</category><category>Telemetry</category><author>Chenyi Zhang</author></item><item><title>Hermes Agent: Building Real Multi-Agent Support</title><link>https://blog.chenyi.ai/posts/hermes-multi-agent/</link><guid isPermaLink="true">https://blog.chenyi.ai/posts/hermes-multi-agent/</guid><description>HermesAgent has a built-in delegate_task tool. I found the problem with it — and built process-isolated sub-agents that actually retain what they learn.</description><pubDate>Fri, 15 May 2026 00:00:00 GMT</pubDate><content:encoded>&lt;img src=&quot;https://blog.chenyi.ai/images/hermes-agent-architecture.png&quot; alt=&quot;Hermes Agent: Building Real Multi-Agent Support&quot; style=&quot;border-radius: 1rem; margin-bottom: 1rem; max-width: 100%; height: auto;&quot; /&gt;&lt;h2&gt;The Problem with Hermes&apos;s Built-in Multi-Agent&lt;/h2&gt;
&lt;p&gt;HermesAgent ships with &lt;code&gt;delegate_task&lt;/code&gt; — it spins up sub-agents in-process, fast and simple. But look at the source code:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;DELEGATE_BLOCKED_TOOLS = frozenset({&quot;delegate_task&quot;, &quot;clarify&quot;, &quot;memory&quot;, ...})
child = AIAgent(..., skip_memory=True, ...)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Every insight a sub-agent develops &lt;strong&gt;dies when the thread exits&lt;/strong&gt;. The swarm does work, but never gets smarter.&lt;/p&gt;
&lt;p&gt;That&apos;s the fundamental problem. Sub-agents are disposable compute, not collaborative intelligence. I wanted something different.&lt;/p&gt;
&lt;h2&gt;What I Built Instead&lt;/h2&gt;
&lt;p&gt;Each sub-agent is a &lt;strong&gt;complete Hermes instance&lt;/strong&gt; — own OS process, own config, own state, full memory access.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/images/hermes-agent-architecture.png&quot; alt=&quot;Main Agent delegates to Sub-Agent via Skill Manager, injecting only the skills needed&quot; /&gt;&lt;/p&gt;
&lt;h3&gt;The Lifecycle&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt;Spawn → Execute → Handoff → Complete → Merge Learnings → Cleanup
&lt;/code&gt;&lt;/pre&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Spawn&lt;/strong&gt;: &lt;code&gt;spawn-agent.sh&lt;/code&gt; snapshots the main agent&apos;s config into an isolated instance&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Execute&lt;/strong&gt;: The sub-agent runs with full autonomy — no restricted tools, real memory&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Handoff&lt;/strong&gt;: Sub-agent writes a structured handoff with findings, memory updates, and skill recommendations&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Complete&lt;/strong&gt;: &lt;code&gt;complete-agent.sh&lt;/code&gt; validates the handoff, sends results via message queue, deletes the instance directory immediately&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Merge&lt;/strong&gt;: The main agent absorbs learnings through the native memory pipeline&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;Instances are ephemeral. Learnings are permanent.&lt;/strong&gt;&lt;/p&gt;
&lt;h2&gt;Mistakes I Made Along the Way&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Zombie agents in the registry.&lt;/strong&gt; Strict bash mode + missing handoff file = the cleanup script exits early, leaving dead entries behind. Fixed with graceful degradation — always clean up the registry, even on failure.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Agent ignored my sub-agent skill.&lt;/strong&gt; Given a choice between native &lt;code&gt;delegate_task&lt;/code&gt; and my shell script approach, the LLM picked the simpler option every time. The model naturally gravitates to the path of least resistance. Fixed by adding a Decision Guide explaining when each approach is appropriate — now the agent knows when to use the lightweight in-process delegate vs. when to spin up a full isolated instance.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Wrong API keys.&lt;/strong&gt; The spawn script was pulling from the global Hermes install instead of the project-local agent. Fixed to fork from the running instance so the sub-agent inherits the correct context.&lt;/p&gt;
&lt;h2&gt;Why This Matters&lt;/h2&gt;
&lt;p&gt;The core insight: &lt;strong&gt;learning shouldn&apos;t be scoped to a thread lifetime&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;If you&apos;re building a multi-agent system and your sub-agents can&apos;t retain what they discover, you&apos;re running an expensive stateless compute cluster, not a system that gets smarter over time.&lt;/p&gt;
&lt;p&gt;Process isolation costs more than in-process threads. But it buys you:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Real memory that persists across the agent&apos;s lifetime&lt;/li&gt;
&lt;li&gt;No cross-contamination between concurrent agents&lt;/li&gt;
&lt;li&gt;Clean handoff artifacts you can inspect and audit&lt;/li&gt;
&lt;li&gt;Agents that actually accumulate knowledge&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;All experiments done with &lt;a href=&quot;https://qoder.dev&quot;&gt;Qoder&lt;/a&gt;&apos;s expert mode — highly recommended for long-running agentic tasks where you want the agent to make mistakes, learn, and fix them autonomously.&lt;/p&gt;
&lt;h2&gt;GitHub&lt;/h2&gt;
&lt;p&gt;Full implementation: &lt;a href=&quot;https://github.com/Czhang0727/agent-from-scratch&quot;&gt;github.com/Czhang0727/agent-from-scratch&lt;/a&gt;&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;em&gt;Next: how skill management keeps the main agent sane as the number of skills grows.&lt;/em&gt;&lt;/p&gt;
</content:encoded><category>Agent from Scratch</category><category>AI Agents</category><category>Hermes</category><category>Multi-Agent</category><category>Sub-Agents</category><author>Chenyi Zhang</author></item><item><title>Agent from Scratch Part 3: Skills</title><link>https://blog.chenyi.ai/posts/part-3-skills/</link><guid isPermaLink="true">https://blog.chenyi.ai/posts/part-3-skills/</guid><description>Skills are user manuals for your agent&apos;s tools. Get them wrong and your agent spends more time confused than working.</description><pubDate>Sun, 10 May 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;What is a Skill?&lt;/h2&gt;
&lt;p&gt;A skill is a user manual for a tool — or a chain of tools.&lt;/p&gt;
&lt;p&gt;If the model isn&apos;t powerful enough to figure out tool usage on its own, a skill also includes examples. Think of it like onboarding documentation: &quot;here&apos;s what this tool does, here&apos;s when to use it, here&apos;s a concrete example.&quot;&lt;/p&gt;
&lt;p&gt;Unlike a one-time prompt, skills are designed to be read repeatedly. Your agent will reach for them on every relevant task.&lt;/p&gt;
&lt;h2&gt;The Pile of Manuals Problem&lt;/h2&gt;
&lt;p&gt;Now imagine your agent has 50 user manuals in front of it. It needs to pick the right one before it can do anything.&lt;/p&gt;
&lt;p&gt;Two problems emerge immediately:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1. Ambiguity kills accuracy.&lt;/strong&gt; If two skills are too similar — say, two different ways to fetch weather data — the model has no reliable way to pick. It&apos;ll guess, and it&apos;ll guess wrong sometimes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2. Context burns tokens.&lt;/strong&gt; Loading every skill into the context window is wasteful and degrades focus. The more irrelevant content the model has to wade through, the noisier its reasoning becomes.&lt;/p&gt;
&lt;p&gt;Modern agent design spends a lot of effort solving the skill selection problem before skill loading ever happens.&lt;/p&gt;
&lt;h2&gt;Skill Selection: Index Before Load&lt;/h2&gt;
&lt;p&gt;The right pattern is: &lt;strong&gt;select index, then load skill&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Think about driving a car. You don&apos;t need the manual for how to fix the engine just because you&apos;re making a left turn. If your agent is writing a document, it doesn&apos;t need the stock trading skill loaded into memory.&lt;/p&gt;
&lt;p&gt;The goal is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Fast&lt;/strong&gt; — retrieval should not be the bottleneck&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Accurate&lt;/strong&gt; — wrong skill = wrong tool = failed task&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In my implementation, I skip the naive &quot;dump all skills into context&quot; approach and instead use indexed selection — match the task to the right skill before injecting anything.&lt;/p&gt;
&lt;h2&gt;Skill Selection as Reinforcement&lt;/h2&gt;
&lt;p&gt;Here&apos;s an interesting insight: skill selection from human behavior is exactly what Meta&apos;s &quot;distill from human&quot; approach does at scale.&lt;/p&gt;
&lt;p&gt;When a human expert picks the right tool for a job, that decision carries signal. If you capture those decisions — which skill was chosen, what was the context, did it succeed — you can train a model to make better choices over time.&lt;/p&gt;
&lt;p&gt;The data you accumulate from real agent runs becomes a natural fine-tuning dataset. Your agent literally gets better at picking the right skill the more it works.&lt;/p&gt;
&lt;h2&gt;What&apos;s in a Skill File?&lt;/h2&gt;
&lt;p&gt;In practice, a skill is a plain text file. It can include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Tool definition&lt;/strong&gt; — what the tool does, its parameters, return values&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Usage instructions&lt;/strong&gt; — when to call it, what to avoid&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Chaining examples&lt;/strong&gt; — how to combine it with other tools&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Failure modes&lt;/strong&gt; — common errors and how to recover&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Images work too, as long as your processor model handles multimodal input.&lt;/p&gt;
&lt;h2&gt;Key Design Principles&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;One skill, one job.&lt;/strong&gt; Overlapping skills cause ambiguity. Deduplicate aggressively.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Index before load.&lt;/strong&gt; Never inject skills you don&apos;t need for the current task.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Skills are maintained, not set-and-forget.&lt;/strong&gt; APIs change, tools break, better patterns emerge. Treat your skills like code.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Capture selection signal.&lt;/strong&gt; Every time your agent picks (or fails to pick) the right skill, that&apos;s training data.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;GitHub&lt;/h2&gt;
&lt;p&gt;The implementation is at &lt;a href=&quot;https://github.com/Czhang0727&quot;&gt;github.com/Czhang0727&lt;/a&gt; — skills, selection logic, and the full agent scaffold.&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;em&gt;Part 4 covers memory — how agents extend context beyond what fits in the window.&lt;/em&gt;&lt;/p&gt;
</content:encoded><category>Agent from Scratch</category><category>AI Agents</category><category>LLM</category><category>Agent Design</category><category>Skills</category><author>Chenyi Zhang</author></item><item><title>Agent from Scratch Part 4: Memory</title><link>https://blog.chenyi.ai/posts/part-4-memory/</link><guid isPermaLink="true">https://blog.chenyi.ai/posts/part-4-memory/</guid><description>Memory in agents is just expanding the context window. Here&apos;s the simple mental model that makes it practical.</description><pubDate>Sun, 10 May 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;Memory = Context Window Expansion&lt;/h2&gt;
&lt;p&gt;In agent design, &quot;memory&quot; is a word that sounds complicated but maps to something concrete: &lt;strong&gt;getting information into the context window that wouldn&apos;t otherwise fit&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;A 256k context window sounds large until you try to fit the world&apos;s knowledge into it. More importantly, you shouldn&apos;t try — attention spreads thin over large contexts. Finding a tiny detail inside a massive blob of text is like finding a footnote inside an encyclopedia. The model can do it, but not reliably.&lt;/p&gt;
&lt;h2&gt;The Early Problem&lt;/h2&gt;
&lt;p&gt;Early LLMs had tiny context windows — 64k tokens was generous. With system prompt, guidance, and conversation history taking up space, you had almost nothing left for the actual knowledge that helps the model produce good answers.&lt;/p&gt;
&lt;p&gt;This forced a key design decision: most knowledge lives outside the context window. You retrieve what you need, when you need it.&lt;/p&gt;
&lt;p&gt;RAG (Retrieval-Augmented Generation) is the canonical solution. So is the knowledge graph. Both are just strategies for deciding what to pull in and when.&lt;/p&gt;
&lt;h2&gt;The File System Mental Model&lt;/h2&gt;
&lt;p&gt;Here&apos;s the simplest way to think about agent memory: &lt;strong&gt;it&apos;s a text file&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Think about what you can do with a text file in an OS:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Read&lt;/strong&gt; — load it into context when relevant&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Append&lt;/strong&gt; — add new information without overwriting&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Overwrite&lt;/strong&gt; — replace when the old content is no longer valid&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Concat&lt;/strong&gt; — merge multiple memory sources&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That&apos;s the complete set of memory operations you need. No magic required.&lt;/p&gt;
&lt;h2&gt;Triggers and Lifecycle&lt;/h2&gt;
&lt;p&gt;Every memory system needs a trigger — some event that causes the agent to create, update, or read memory.&lt;/p&gt;
&lt;p&gt;In my implementation, the trigger is simple: &lt;strong&gt;end of session&lt;/strong&gt;. After each session, the agent:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Scans existing memories for relevance&lt;/li&gt;
&lt;li&gt;Summarizes what happened: key actions taken, what worked, what failed&lt;/li&gt;
&lt;li&gt;Writes a new memory entry or updates an existing one&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This creates a persistent record of agent experience. Over multiple sessions, the agent builds up a structured history of its own performance.&lt;/p&gt;
&lt;h2&gt;Memory as Training Data&lt;/h2&gt;
&lt;p&gt;Here&apos;s where it gets interesting.&lt;/p&gt;
&lt;p&gt;The summaries your agent writes are exactly the kind of data you&apos;d want for fine-tuning. Key decisions made, correct choices, wrong turns, recovery patterns — this is behavioral signal in a clean format.&lt;/p&gt;
&lt;p&gt;If you can afford to fine-tune (or when fine-tuning costs drop further), the memory log from a well-designed agent becomes a natural training dataset. Your model starts embodying the patterns of whoever built the agent.&lt;/p&gt;
&lt;h2&gt;Evaluation Closes the Loop&lt;/h2&gt;
&lt;p&gt;In production, you don&apos;t just write memories blindly. You run an evaluation pass after each session:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Did the agent achieve the goal?&lt;/li&gt;
&lt;li&gt;Were the actions efficient?&lt;/li&gt;
&lt;li&gt;Were any tools misused?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Only memories that pass evaluation get committed to long-term storage. Bad runs get flagged for review, not reinforced.&lt;/p&gt;
&lt;p&gt;This is the same loop that makes humans better at their jobs: do, reflect, evaluate, adjust.&lt;/p&gt;
&lt;h2&gt;The Full Picture&lt;/h2&gt;
&lt;pre&gt;&lt;code&gt;Session starts
  → Load relevant memories into context
  → Agent executes task using skills + context

Session ends
  → Summarize what happened
  → Evaluate quality
  → Write/update memory
  → (Optional) Flag data for fine-tuning
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Simple. No exotic architecture required. The complexity is in the evaluation step — deciding what counts as a good run is the hardest part.&lt;/p&gt;
&lt;h2&gt;GitHub&lt;/h2&gt;
&lt;p&gt;Implementation is at &lt;a href=&quot;https://github.com/Czhang0727&quot;&gt;github.com/Czhang0727&lt;/a&gt;. The memory system is the simplest module in the repo — a reminder that the best designs usually are.&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;em&gt;The next post covers Hermes — a real-world agent hitting the limits of this design and what I built to fix it.&lt;/em&gt;&lt;/p&gt;
</content:encoded><category>Agent from Scratch</category><category>AI Agents</category><category>LLM</category><category>Memory</category><category>RAG</category><category>Agent Design</category><author>Chenyi Zhang</author></item><item><title>Agent from Scratch Part 2: Orchestration</title><link>https://blog.chenyi.ai/posts/part-2-orchestration/</link><guid isPermaLink="true">https://blog.chenyi.ai/posts/part-2-orchestration/</guid><description>IO is hooked up but the model just answers questions. Orchestration is how you turn a chatbot into an agent that actually does things.</description><pubDate>Fri, 01 May 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;The Problem with Raw LLM&lt;/h2&gt;
&lt;p&gt;Now that IO is hooked up, you&apos;d think the agent should work. It doesn&apos;t.&lt;/p&gt;
&lt;p&gt;A raw LLM is pretty much Q&amp;amp;A — there&apos;s no skill, no action. It just answers your input with predicted tokens. Impressive, but useless as an agent.&lt;/p&gt;
&lt;p&gt;To resolve that, we need prompt engineering. This is probably the &lt;strong&gt;only truly unique part&lt;/strong&gt; of an LLM-powered agent system. Everything else borrows from existing software patterns.&lt;/p&gt;
&lt;h2&gt;Two Types of Prompts&lt;/h2&gt;
&lt;p&gt;In my implementation I gave the agent two core prompts: &lt;strong&gt;emotional support&lt;/strong&gt; and &lt;strong&gt;productivity&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;The difference lands on what we ask the agent to do:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Emotional support prompt&lt;/strong&gt;: &quot;Say something nice, be supportive.&quot; The prompt helps the agent recognize that its job is comfort, not tasks. From the model&apos;s point of view, we&apos;ve provided context, so it can make a better prediction.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Productivity prompt&lt;/strong&gt;: Way more complex. This is where the &quot;harness system&quot; lives.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;The Harness System&lt;/h2&gt;
&lt;p&gt;Harness engineering = creating a bash-style execution environment where:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;We have skills (bash commands)&lt;/li&gt;
&lt;li&gt;We define how to trigger them (accurate match vs. model-generated)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In my example, I created a set of skill schemas. It&apos;s a bit old-fashioned compared to plain-text skills I&apos;ll cover later, but they do the same thing at their core.&lt;/p&gt;
&lt;p&gt;The full execution loop looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;LLM reads local env
  → finds function it can use
  → understands the task
  → does it
  → validates and responds to user
  → user annotates (correct / incorrect)
  → agent learns from execution
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The abstraction isn&apos;t that different from humans: fail more, learn more. And eventually there will be an &quot;aha moment.&quot;&lt;/p&gt;
&lt;h2&gt;Distillation is the Real Secret&lt;/h2&gt;
&lt;p&gt;I almost forgot to mention the most important thing: &lt;strong&gt;&quot;learn from other people&apos;s success or failure&quot;&lt;/strong&gt; is the best way to describe what good orchestration enables.&lt;/p&gt;
&lt;p&gt;When you capture agent execution logs — what it tried, whether it worked, what the user annotated — you have a distillation dataset. That&apos;s exactly what powerful models are trained on: human-annotated traces of good decisions.&lt;/p&gt;
&lt;h2&gt;Keep It Simple&lt;/h2&gt;
&lt;p&gt;Data is king. Keep the flow simple but logical. Let the agent figure out the best way to do things — don&apos;t over-engineer the orchestration layer.&lt;/p&gt;
&lt;p&gt;A complex orchestration system you built becomes a constraint the agent has to work around. A simple harness the agent can reason about is a tool the agent can use.&lt;/p&gt;
&lt;h2&gt;GitHub&lt;/h2&gt;
&lt;p&gt;Full code at &lt;a href=&quot;https://github.com/Czhang0727/agent-from-scratch&quot;&gt;github.com/Czhang0727/agent-from-scratch&lt;/a&gt;.&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;em&gt;Part 3 covers skills — the user manuals that tell your agent which tools to use and when.&lt;/em&gt;&lt;/p&gt;
</content:encoded><category>Agent from Scratch</category><category>AI Agents</category><category>LLM</category><category>Orchestration</category><category>Prompt Engineering</category><author>Chenyi Zhang</author></item><item><title>Agent from Scratch Part 1: IO</title><link>https://blog.chenyi.ai/posts/part-1-io/</link><guid isPermaLink="true">https://blog.chenyi.ai/posts/part-1-io/</guid><description>Building an agent starts with one question: how does it talk to the world? Text in, text out — and everything else is just a plugin.</description><pubDate>Fri, 10 Apr 2026 00:00:00 GMT</pubDate><content:encoded>&lt;img src=&quot;https://blog.chenyi.ai/images/agent-io-diagram.png&quot; alt=&quot;Agent from Scratch Part 1: IO&quot; style=&quot;border-radius: 1rem; margin-bottom: 1rem; max-width: 100%; height: auto;&quot; /&gt;&lt;blockquote&gt;
&lt;p&gt;I lost my wisdom teeth today, so let&apos;s make it simple...&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;What is IO for an Agent?&lt;/h2&gt;
&lt;p&gt;IO defines how your agent system can explore or communicate with its external environment.&lt;/p&gt;
&lt;p&gt;It&apos;s not a real human — it won&apos;t see, smell, or feel. Anything going &lt;strong&gt;in&lt;/strong&gt; to the agent, and anything coming &lt;strong&gt;out&lt;/strong&gt;, is plain bits.&lt;/p&gt;
&lt;p&gt;Here&apos;s the bare minimum IO you need for an agent:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Text input&lt;/li&gt;
&lt;li&gt;Text output&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;That&apos;s it. You can build a lot with just that.&lt;/p&gt;
&lt;h2&gt;Processors: Not Just Neural Nets&lt;/h2&gt;
&lt;p&gt;Before machine learning took over, processors were rule-based. Believe it or not, these systems still run today — when you call your bank and hear &quot;Press 1 for balance, Press 2 for transfers,&quot; that&apos;s a rule-based agent. I&apos;ll cover that section later. For now, let&apos;s focus on IO.&lt;/p&gt;
&lt;h2&gt;Multimodal: Making IO Cooler&lt;/h2&gt;
&lt;p&gt;Want to go beyond text? &quot;Multimodal support&quot; just means your IO bus handles more data types. Video, image, voice — these are already solved problems:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Image viewer&lt;/li&gt;
&lt;li&gt;Video player&lt;/li&gt;
&lt;li&gt;MP3 player&lt;/li&gt;
&lt;li&gt;Microphone input drivers&lt;/li&gt;
&lt;li&gt;Image transformer&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;None of these are new. They&apos;ve been around for decades, and they perfectly meet agent needs. The trick is making your IO bus &lt;strong&gt;generalized&lt;/strong&gt; — built to accept more input types via plugins over time.&lt;/p&gt;
&lt;p&gt;Think about where this goes: agents will soon have physical bodies. IoT sensors will feed into the same IO bus. The abstraction that handles voice today will handle temperature sensors tomorrow.&lt;/p&gt;
&lt;h2&gt;Design Principle: Generalize Your IO Bus&lt;/h2&gt;
&lt;p&gt;Don&apos;t hardcode IO types. Build a plugin-friendly bus where new input/output channels can be added without touching core agent logic.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/images/agent-io-diagram.png&quot; alt=&quot;Agent IO architecture — External Input flows into Agent, which connects to Memory, External Tools, and Guidelines, then outputs via Agent Output&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Your agent&apos;s intelligence lives in the middle. The IO bus is just plumbing — but design it well and you only build it once.&lt;/p&gt;
&lt;h2&gt;GitHub&lt;/h2&gt;
&lt;p&gt;Full implementation at &lt;a href=&quot;https://github.com/Czhang0727/agent-from-scratch&quot;&gt;github.com/Czhang0727/agent-from-scratch&lt;/a&gt;.&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;em&gt;Part 2 covers orchestration — once IO is hooked up, how do you get the model to actually do things?&lt;/em&gt;&lt;/p&gt;
</content:encoded><category>Agent from Scratch</category><category>AI Agents</category><category>LLM</category><category>Agent Design</category><category>IO</category><author>Chenyi Zhang</author></item><item><title>Agent from Scratch Part 0: What Is an Agent?</title><link>https://blog.chenyi.ai/posts/part-0-overview/</link><guid isPermaLink="true">https://blog.chenyi.ai/posts/part-0-overview/</guid><description>Starting from first principles — an agent is just a workflow that thinks like a human. Here&apos;s the 10,000ft view before we build it.</description><pubDate>Wed, 01 Apr 2026 00:00:00 GMT</pubDate><content:encoded>&lt;img src=&quot;https://blog.chenyi.ai/images/agent-io-diagram.png&quot; alt=&quot;Agent from Scratch Part 0: What Is an Agent?&quot; style=&quot;border-radius: 1rem; margin-bottom: 1rem; max-width: 100%; height: auto;&quot; /&gt;&lt;p&gt;I&apos;m starting to build a general agent framework from scratch, sharing what I&apos;ve learned over the past few years. Let&apos;s start from the very beginning.&lt;/p&gt;
&lt;h2&gt;What Is an Agent?&lt;/h2&gt;
&lt;p&gt;IMO, an agent is a &lt;strong&gt;workflow that can think like a human&lt;/strong&gt; — do what a human can do. That concept existed even before LLMs, when we had stateful agents in backend system design.&lt;/p&gt;
&lt;p&gt;The only reason &quot;agents&quot; are popular now is Large Models. We finally found a moment when agent design could be generalized — not hand-crafted for each narrow task.&lt;/p&gt;
&lt;h2&gt;The 10,000ft View: An Agent Is a PC&lt;/h2&gt;
&lt;p&gt;Back to old-fashioned computing: we have IO, a CPU, and storage.&lt;/p&gt;
&lt;p&gt;An agent maps almost perfectly:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;CPU&lt;/strong&gt; → LLM&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;IO&lt;/strong&gt; → connector to external devices (tools, APIs, sensors)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Storage&lt;/strong&gt; → memory&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Yep, it&apos;s that simple.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/images/agent-io-diagram.png&quot; alt=&quot;Agent architecture — External Input flows into Agent, which connects to Memory, External Tools, and Guidelines, producing Agent Output&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Over time, engineers added fancy stuff to make each component faster:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Better CPU → better models&lt;/li&gt;
&lt;li&gt;Larger bandwidth → larger context windows&lt;/li&gt;
&lt;li&gt;More applications → more skills / MCP servers&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Nothing fundamentally changed.&lt;/strong&gt;&lt;/p&gt;
&lt;h2&gt;The Agent Heartbeat&lt;/h2&gt;
&lt;p&gt;Here&apos;s the fake code of agent orchestration — if you know how OpenClaw works, this is pretty much the heartbeat:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;while True:
    sleep(1000)
    input = read_input(context)
    intent_and_plan = think(context, input)
    execution_result = do(context, intent_and_plan)
    # this phase can be async sometime
    evaluation(context, execution_result)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Simple loop: read, think, do, evaluate. Repeat.&lt;/p&gt;
&lt;h2&gt;The Event-Driven Upgrade&lt;/h2&gt;
&lt;p&gt;There&apos;s a known problem with &lt;code&gt;sleep&lt;/code&gt; — wasting resources waiting. The solution? Event-driven, just like JavaScript.&lt;/p&gt;
&lt;p&gt;Claude Code&apos;s internals indicate they&apos;re doing the same thing. So the loop evolves:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;User interaction side:&lt;/strong&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;pub_sub_client = PubSubClient()

input = read_user_input()
pub_sub_client.send(topic=&quot;user_input&quot;, input)
result = pub_sub_client.subscript(topic=&quot;task_result&quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Consumer (agent) side:&lt;/strong&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;user_input = pub_sub_client.subscript(topic=&quot;user_input&quot;)
intent_and_plan = think(context, input)
execution_result = do(context, intent_and_plan)
pub_sub_client.send(topic=&quot;task_result&quot;, execution_result)
# this phase can be async sometime
evaluation(context, execution_result)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Clean decoupling. The agent becomes a proper event consumer.&lt;/p&gt;
&lt;h2&gt;What&apos;s Coming&lt;/h2&gt;
&lt;p&gt;In this series, I&apos;ll dig deeper into each component:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;IO&lt;/strong&gt; — how the agent talks to the world&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Orchestration&lt;/strong&gt; — prompt engineering and the harness system&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Skills&lt;/strong&gt; — user manuals for tools&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Memory&lt;/strong&gt; — expanding the context window&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Multi-agent&lt;/strong&gt; — when one agent isn&apos;t enough&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Track progress and raise issues / PRs at &lt;a href=&quot;https://github.com/Czhang0727/agent-from-scratch&quot;&gt;github.com/Czhang0727/agent-from-scratch&lt;/a&gt;.&lt;/p&gt;
</content:encoded><category>AI Agents</category><category>LLM</category><category>Agent Design</category><category>Agent from Scratch</category><author>Chenyi Zhang</author></item></channel></rss>