<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <id>https://blog.chenyi.ai/</id>
    <title>Chenyi's Blog</title>
    <updated>2026-06-06T00:00:00.000Z</updated>
    <generator>Astro - Feed Library</generator>
    <author>
        <name>Chenyi Zhang</name>
        <email>chenyi@character.ai</email>
        <uri>https://blog.chenyi.ai/</uri>
    </author>
    <link rel="alternate" href="https://blog.chenyi.ai/"/>
    <link rel="self" href="https://blog.chenyi.ai/atom.xml"/>
    <subtitle>Thoughts on AI, Engineering &amp; Life</subtitle>
    <icon>https://blog.chenyi.ai/favicon.ico</icon>
    <rights>All rights reserved 2026, Chenyi's Blog</rights>
    <entry>
        <title type="html"><![CDATA[Hermes Agent: When Your Agent Has Too Many Skills]]></title>
        <id>https://blog.chenyi.ai/posts/hermes-skill-management/</id>
        <link href="https://blog.chenyi.ai/posts/hermes-skill-management/"/>
        <updated>2026-06-06T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Real-world lesson from running Hermes at scale: too many skills makes your agent go nuts. Here's the system I built to fix it.]]></summary>
        <content type="html"><![CDATA[<img src="https://blog.chenyi.ai/images/hermes-skill-management-diagram.png" alt="Hermes Agent: When Your Agent Has Too Many Skills" style="border-radius: 1rem; margin-bottom: 1rem; width: 100%; object-fit: cover;" /><h2>The Problem Nobody Talks About</h2>
<p>When any agent has too many skills — and by “too many” I mean past some fuzzy threshold that depends on skill complexity and overlap — <strong>the agent will eventually go nuts</strong>.</p>
<p>Here’s a concrete example. I was running Hermes and had both a Gmail skill and a Google Workspace skill. They overlapped. At some point, the Gmail skill’s API went out of date. Every time the agent called it:</p>
<blockquote>
<p><em>“Sorry, API failed. Let me directly fetch the web… oh, I don’t have access. Let me rethink… actually, wait, you have another skill that might work…”</em></p>
</blockquote>
<p>Burning tokens. Spinning in circles. Not working.</p>
<p>The obvious fix — manually review and clean up the skills — doesn’t scale. Skills aren’t static. They’re more like repos: they need to be maintained. APIs change, tools break, better patterns emerge.</p>
<p>I needed AI to maintain the skills, not me.</p>
<h2>The Architecture</h2>
<p>Here’s what I built (Codex and Claude Code wrote 100% of it, I just described the flows 😅):</p>
<p>Here’s the full flow:</p>
<p><img src="https://blog.chenyi.ai/images/hermes-skill-management-diagram.png" alt="Hermes Skill Management architecture — Skills Manager at center, fed by memory components (Use Memory, Local Skill Storage/RAGized, History COT→value) and CronSignal. Outputs to Install Reduced Skills or Rewrite Skills, both feeding into MAGIC EVALUATION FOLKS!!, which triggers actual skill update and index reload." /></p>
<p>And the agent delegation model that makes it work:</p>
<p><img src="https://blog.chenyi.ai/images/hermes-agent-architecture.png" alt="Main Agent architecture — Main Agent coordinates with Plan, then delegates to Sub-Agent, which receives only the skills it needs from Skill Manager" /></p>
<h3>1. Skill Manager with RAG-based Deduplication</h3>
<p>Before installing any new skill, run it through a skill manager that checks for redundancy.</p>
<ul>
<li>Match by keyword and semantic embedding</li>
<li>Also use tags: “google”, “productivity”, “stock trading” — not just embedding similarity</li>
<li>At ~5k skills, this runs fast enough to be practical</li>
</ul>
<p>If a new skill is too similar to an existing one, reject it or merge the concepts.</p>
<h3>2. Telemetry System</h3>
<p>Every skill call gets logged:</p>
<ul>
<li>Success or failure</li>
<li>Chain-of-thought trace</li>
<li>Token cost</li>
</ul>
<p>Stored in local SQL + blob storage. This is the data layer that makes everything else possible.</p>
<h3>3. Installation Filter</h3>
<p>On every new extension/plugin/skill install, the skill manager runs first. The filter compares the new skill against existing ones and reduces overlap before it lands in the system.</p>
<h3>4. Weekly Cron Audit</h3>
<p>A cron job (I haven’t tuned the trigger yet — tell me a better one) does a delta review of the telemetry logs:</p>
<ul>
<li>Find skills with high failure rates or bloated COT</li>
<li>Decide: modify the skill to be more efficient, or delete it</li>
<li>If deleting, use web search to find or create a replacement</li>
</ul>
<p><strong>Critical:</strong> make sure your eval environment is stable before running this. You don’t want the audit job to delete a working skill because it was measured during a bad network day.</p>
<h3>5. Main Agent Restructure</h3>
<p>The main agent no longer holds specific skills directly. Instead:</p>
<ol>
<li>Main agent receives a task</li>
<li>Spawns a sub-agent</li>
<li>Sub-agent calls the skill manager to install what it needs</li>
<li>Sub-agent executes</li>
</ol>
<p>The main agent’s only job is <strong>managing other agents and planning</strong>. I’m still thinking about whether planning and execution should be split further — probably over-engineering at this stage.</p>
<h2>Why This Changes the Design Fundamentally</h2>
<p>The skill management system forces a question I hadn’t thought about clearly before: <strong>what should the main agent be good at?</strong></p>
<p>My answer: not much, specifically. The main agent should be good at delegation and planning. Everything else — tool use, skill selection, domain expertise — gets handled by specialized sub-agents that spin up with exactly the skills they need.</p>
<p>This is closer to how real teams work. A good manager doesn’t know how to do every job on the team. They know who to call and what to ask for.</p>
<h2>What’s Left (TODO)</h2>
<ul>
<li>Better trigger for the audit cron (weekly is arbitrary)</li>
<li>Web search integration for auto-replacing deleted skills</li>
<li>Eval environment stability before running automated cleanup</li>
<li>Split planning into a separate agent (maybe)</li>
</ul>
<h2>On the Birth Announcement</h2>
<p>Yes, I buried the lede. My first baby Grace was just born. Between her and a TOP urgent work task, the Agent from Scratch series is delayed. But I’m not giving up on it. Q.Q</p>
<hr />
<p><em>This is the Hermes architecture as of June 2026. The code is on <a href="https://github.com/Czhang0727">GitHub</a> — Claude Code wrote it, I just had the ideas.</em></p>
]]></content>
        <author>
            <name>Chenyi Zhang</name>
            <email>chenyi@character.ai</email>
            <uri>https://blog.chenyi.ai/</uri>
        </author>
        <category label="Agent from Scratch"/>
        <category label="AI Agents"/>
        <category label="Hermes"/>
        <category label="Skill Management"/>
        <category label="RAG"/>
        <category label="Telemetry"/>
        <published>2026-06-06T00:00:00.000Z</published>
    </entry>
    <entry>
        <title type="html"><![CDATA[Hermes Agent: Building Real Multi-Agent Support]]></title>
        <id>https://blog.chenyi.ai/posts/hermes-multi-agent/</id>
        <link href="https://blog.chenyi.ai/posts/hermes-multi-agent/"/>
        <updated>2026-05-15T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[HermesAgent has a built-in delegate_task tool. I found the problem with it — and built process-isolated sub-agents that actually retain what they learn.]]></summary>
        <content type="html"><![CDATA[<img src="https://blog.chenyi.ai/images/hermes-agent-architecture.png" alt="Hermes Agent: Building Real Multi-Agent Support" style="border-radius: 1rem; margin-bottom: 1rem; width: 100%; object-fit: cover;" /><h2>The Problem with Hermes’s Built-in Multi-Agent</h2>
<p>HermesAgent ships with <code>delegate_task</code> — it spins up sub-agents in-process, fast and simple. But look at the source code:</p>
<pre><code class="language-python">DELEGATE_BLOCKED_TOOLS = frozenset({"delegate_task", "clarify", "memory", ...})
child = AIAgent(..., skip_memory=True, ...)
</code></pre>
<p>Every insight a sub-agent develops <strong>dies when the thread exits</strong>. The swarm does work, but never gets smarter.</p>
<p>That’s the fundamental problem. Sub-agents are disposable compute, not collaborative intelligence. I wanted something different.</p>
<h2>What I Built Instead</h2>
<p>Each sub-agent is a <strong>complete Hermes instance</strong> — own OS process, own config, own state, full memory access.</p>
<p><img src="https://blog.chenyi.ai/images/hermes-agent-architecture.png" alt="Main Agent delegates to Sub-Agent via Skill Manager, injecting only the skills needed" /></p>
<h3>The Lifecycle</h3>
<pre><code>Spawn → Execute → Handoff → Complete → Merge Learnings → Cleanup
</code></pre>
<ol>
<li><strong>Spawn</strong>: <code>spawn-agent.sh</code> snapshots the main agent’s config into an isolated instance</li>
<li><strong>Execute</strong>: The sub-agent runs with full autonomy — no restricted tools, real memory</li>
<li><strong>Handoff</strong>: Sub-agent writes a structured handoff with findings, memory updates, and skill recommendations</li>
<li><strong>Complete</strong>: <code>complete-agent.sh</code> validates the handoff, sends results via message queue, deletes the instance directory immediately</li>
<li><strong>Merge</strong>: The main agent absorbs learnings through the native memory pipeline</li>
</ol>
<p><strong>Instances are ephemeral. Learnings are permanent.</strong></p>
<h2>Mistakes I Made Along the Way</h2>
<p><strong>Zombie agents in the registry.</strong> Strict bash mode + missing handoff file = the cleanup script exits early, leaving dead entries behind. Fixed with graceful degradation — always clean up the registry, even on failure.</p>
<p><strong>Agent ignored my sub-agent skill.</strong> Given a choice between native <code>delegate_task</code> and my shell script approach, the LLM picked the simpler option every time. The model naturally gravitates to the path of least resistance. Fixed by adding a Decision Guide explaining when each approach is appropriate — now the agent knows when to use the lightweight in-process delegate vs. when to spin up a full isolated instance.</p>
<p><strong>Wrong API keys.</strong> The spawn script was pulling from the global Hermes install instead of the project-local agent. Fixed to fork from the running instance so the sub-agent inherits the correct context.</p>
<h2>Why This Matters</h2>
<p>The core insight: <strong>learning shouldn’t be scoped to a thread lifetime</strong>.</p>
<p>If you’re building a multi-agent system and your sub-agents can’t retain what they discover, you’re running an expensive stateless compute cluster, not a system that gets smarter over time.</p>
<p>Process isolation costs more than in-process threads. But it buys you:</p>
<ul>
<li>Real memory that persists across the agent’s lifetime</li>
<li>No cross-contamination between concurrent agents</li>
<li>Clean handoff artifacts you can inspect and audit</li>
<li>Agents that actually accumulate knowledge</li>
</ul>
<p>All experiments done with <a href="https://qoder.dev">Qoder</a>’s expert mode — highly recommended for long-running agentic tasks where you want the agent to make mistakes, learn, and fix them autonomously.</p>
<h2>GitHub</h2>
<p>Full implementation: <a href="https://github.com/Czhang0727/agent-from-scratch">github.com/Czhang0727/agent-from-scratch</a></p>
<hr />
<p><em>Next: how skill management keeps the main agent sane as the number of skills grows.</em></p>
]]></content>
        <author>
            <name>Chenyi Zhang</name>
            <email>chenyi@character.ai</email>
            <uri>https://blog.chenyi.ai/</uri>
        </author>
        <category label="Agent from Scratch"/>
        <category label="AI Agents"/>
        <category label="Hermes"/>
        <category label="Multi-Agent"/>
        <category label="Sub-Agents"/>
        <published>2026-05-15T00:00:00.000Z</published>
    </entry>
    <entry>
        <title type="html"><![CDATA[Agent from Scratch Part 3: Skills]]></title>
        <id>https://blog.chenyi.ai/posts/part-3-skills/</id>
        <link href="https://blog.chenyi.ai/posts/part-3-skills/"/>
        <updated>2026-05-10T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Skills are user manuals for your agent's tools. Get them wrong and your agent spends more time confused than working.]]></summary>
        <content type="html"><![CDATA[<h2>What is a Skill?</h2>
<p>A skill is a user manual for a tool — or a chain of tools.</p>
<p>If the model isn’t powerful enough to figure out tool usage on its own, a skill also includes examples. Think of it like onboarding documentation: “here’s what this tool does, here’s when to use it, here’s a concrete example.”</p>
<p>Unlike a one-time prompt, skills are designed to be read repeatedly. Your agent will reach for them on every relevant task.</p>
<h2>The Pile of Manuals Problem</h2>
<p>Now imagine your agent has 50 user manuals in front of it. It needs to pick the right one before it can do anything.</p>
<p>Two problems emerge immediately:</p>
<p><strong>1. Ambiguity kills accuracy.</strong> If two skills are too similar — say, two different ways to fetch weather data — the model has no reliable way to pick. It’ll guess, and it’ll guess wrong sometimes.</p>
<p><strong>2. Context burns tokens.</strong> Loading every skill into the context window is wasteful and degrades focus. The more irrelevant content the model has to wade through, the noisier its reasoning becomes.</p>
<p>Modern agent design spends a lot of effort solving the skill selection problem before skill loading ever happens.</p>
<h2>Skill Selection: Index Before Load</h2>
<p>The right pattern is: <strong>select index, then load skill</strong>.</p>
<p>Think about driving a car. You don’t need the manual for how to fix the engine just because you’re making a left turn. If your agent is writing a document, it doesn’t need the stock trading skill loaded into memory.</p>
<p>The goal is:</p>
<ul>
<li><strong>Fast</strong> — retrieval should not be the bottleneck</li>
<li><strong>Accurate</strong> — wrong skill = wrong tool = failed task</li>
</ul>
<p>In my implementation, I skip the naive “dump all skills into context” approach and instead use indexed selection — match the task to the right skill before injecting anything.</p>
<h2>Skill Selection as Reinforcement</h2>
<p>Here’s an interesting insight: skill selection from human behavior is exactly what Meta’s “distill from human” approach does at scale.</p>
<p>When a human expert picks the right tool for a job, that decision carries signal. If you capture those decisions — which skill was chosen, what was the context, did it succeed — you can train a model to make better choices over time.</p>
<p>The data you accumulate from real agent runs becomes a natural fine-tuning dataset. Your agent literally gets better at picking the right skill the more it works.</p>
<h2>What’s in a Skill File?</h2>
<p>In practice, a skill is a plain text file. It can include:</p>
<ul>
<li><strong>Tool definition</strong> — what the tool does, its parameters, return values</li>
<li><strong>Usage instructions</strong> — when to call it, what to avoid</li>
<li><strong>Chaining examples</strong> — how to combine it with other tools</li>
<li><strong>Failure modes</strong> — common errors and how to recover</li>
</ul>
<p>Images work too, as long as your processor model handles multimodal input.</p>
<h2>Key Design Principles</h2>
<ol>
<li><strong>One skill, one job.</strong> Overlapping skills cause ambiguity. Deduplicate aggressively.</li>
<li><strong>Index before load.</strong> Never inject skills you don’t need for the current task.</li>
<li><strong>Skills are maintained, not set-and-forget.</strong> APIs change, tools break, better patterns emerge. Treat your skills like code.</li>
<li><strong>Capture selection signal.</strong> Every time your agent picks (or fails to pick) the right skill, that’s training data.</li>
</ol>
<h2>GitHub</h2>
<p>The implementation is at <a href="https://github.com/Czhang0727">github.com/Czhang0727</a> — skills, selection logic, and the full agent scaffold.</p>
<hr />
<p><em>Part 4 covers memory — how agents extend context beyond what fits in the window.</em></p>
]]></content>
        <author>
            <name>Chenyi Zhang</name>
            <email>chenyi@character.ai</email>
            <uri>https://blog.chenyi.ai/</uri>
        </author>
        <category label="Agent from Scratch"/>
        <category label="AI Agents"/>
        <category label="LLM"/>
        <category label="Agent Design"/>
        <category label="Skills"/>
        <published>2026-05-10T00:00:00.000Z</published>
    </entry>
    <entry>
        <title type="html"><![CDATA[Agent from Scratch Part 4: Memory]]></title>
        <id>https://blog.chenyi.ai/posts/part-4-memory/</id>
        <link href="https://blog.chenyi.ai/posts/part-4-memory/"/>
        <updated>2026-05-10T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Memory in agents is just expanding the context window. Here's the simple mental model that makes it practical.]]></summary>
        <content type="html"><![CDATA[<h2>Memory = Context Window Expansion</h2>
<p>In agent design, “memory” is a word that sounds complicated but maps to something concrete: <strong>getting information into the context window that wouldn’t otherwise fit</strong>.</p>
<p>A 256k context window sounds large until you try to fit the world’s knowledge into it. More importantly, you shouldn’t try — attention spreads thin over large contexts. Finding a tiny detail inside a massive blob of text is like finding a footnote inside an encyclopedia. The model can do it, but not reliably.</p>
<h2>The Early Problem</h2>
<p>Early LLMs had tiny context windows — 64k tokens was generous. With system prompt, guidance, and conversation history taking up space, you had almost nothing left for the actual knowledge that helps the model produce good answers.</p>
<p>This forced a key design decision: most knowledge lives outside the context window. You retrieve what you need, when you need it.</p>
<p>RAG (Retrieval-Augmented Generation) is the canonical solution. So is the knowledge graph. Both are just strategies for deciding what to pull in and when.</p>
<h2>The File System Mental Model</h2>
<p>Here’s the simplest way to think about agent memory: <strong>it’s a text file</strong>.</p>
<p>Think about what you can do with a text file in an OS:</p>
<ul>
<li><strong>Read</strong> — load it into context when relevant</li>
<li><strong>Append</strong> — add new information without overwriting</li>
<li><strong>Overwrite</strong> — replace when the old content is no longer valid</li>
<li><strong>Concat</strong> — merge multiple memory sources</li>
</ul>
<p>That’s the complete set of memory operations you need. No magic required.</p>
<h2>Triggers and Lifecycle</h2>
<p>Every memory system needs a trigger — some event that causes the agent to create, update, or read memory.</p>
<p>In my implementation, the trigger is simple: <strong>end of session</strong>. After each session, the agent:</p>
<ol>
<li>Scans existing memories for relevance</li>
<li>Summarizes what happened: key actions taken, what worked, what failed</li>
<li>Writes a new memory entry or updates an existing one</li>
</ol>
<p>This creates a persistent record of agent experience. Over multiple sessions, the agent builds up a structured history of its own performance.</p>
<h2>Memory as Training Data</h2>
<p>Here’s where it gets interesting.</p>
<p>The summaries your agent writes are exactly the kind of data you’d want for fine-tuning. Key decisions made, correct choices, wrong turns, recovery patterns — this is behavioral signal in a clean format.</p>
<p>If you can afford to fine-tune (or when fine-tuning costs drop further), the memory log from a well-designed agent becomes a natural training dataset. Your model starts embodying the patterns of whoever built the agent.</p>
<h2>Evaluation Closes the Loop</h2>
<p>In production, you don’t just write memories blindly. You run an evaluation pass after each session:</p>
<ul>
<li>Did the agent achieve the goal?</li>
<li>Were the actions efficient?</li>
<li>Were any tools misused?</li>
</ul>
<p>Only memories that pass evaluation get committed to long-term storage. Bad runs get flagged for review, not reinforced.</p>
<p>This is the same loop that makes humans better at their jobs: do, reflect, evaluate, adjust.</p>
<h2>The Full Picture</h2>
<pre><code>Session starts
  → Load relevant memories into context
  → Agent executes task using skills + context

Session ends
  → Summarize what happened
  → Evaluate quality
  → Write/update memory
  → (Optional) Flag data for fine-tuning
</code></pre>
<p>Simple. No exotic architecture required. The complexity is in the evaluation step — deciding what counts as a good run is the hardest part.</p>
<h2>GitHub</h2>
<p>Implementation is at <a href="https://github.com/Czhang0727">github.com/Czhang0727</a>. The memory system is the simplest module in the repo — a reminder that the best designs usually are.</p>
<hr />
<p><em>The next post covers Hermes — a real-world agent hitting the limits of this design and what I built to fix it.</em></p>
]]></content>
        <author>
            <name>Chenyi Zhang</name>
            <email>chenyi@character.ai</email>
            <uri>https://blog.chenyi.ai/</uri>
        </author>
        <category label="Agent from Scratch"/>
        <category label="AI Agents"/>
        <category label="LLM"/>
        <category label="Memory"/>
        <category label="RAG"/>
        <category label="Agent Design"/>
        <published>2026-05-10T00:00:00.000Z</published>
    </entry>
    <entry>
        <title type="html"><![CDATA[Agent from Scratch Part 2: Orchestration]]></title>
        <id>https://blog.chenyi.ai/posts/part-2-orchestration/</id>
        <link href="https://blog.chenyi.ai/posts/part-2-orchestration/"/>
        <updated>2026-05-01T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[IO is hooked up but the model just answers questions. Orchestration is how you turn a chatbot into an agent that actually does things.]]></summary>
        <content type="html"><![CDATA[<h2>The Problem with Raw LLM</h2>
<p>Now that IO is hooked up, you’d think the agent should work. It doesn’t.</p>
<p>A raw LLM is pretty much Q&amp;A — there’s no skill, no action. It just answers your input with predicted tokens. Impressive, but useless as an agent.</p>
<p>To resolve that, we need prompt engineering. This is probably the <strong>only truly unique part</strong> of an LLM-powered agent system. Everything else borrows from existing software patterns.</p>
<h2>Two Types of Prompts</h2>
<p>In my implementation I gave the agent two core prompts: <strong>emotional support</strong> and <strong>productivity</strong>.</p>
<p>The difference lands on what we ask the agent to do:</p>
<ul>
<li>
<p><strong>Emotional support prompt</strong>: “Say something nice, be supportive.” The prompt helps the agent recognize that its job is comfort, not tasks. From the model’s point of view, we’ve provided context, so it can make a better prediction.</p>
</li>
<li>
<p><strong>Productivity prompt</strong>: Way more complex. This is where the “harness system” lives.</p>
</li>
</ul>
<h2>The Harness System</h2>
<p>Harness engineering = creating a bash-style execution environment where:</p>
<ul>
<li>We have skills (bash commands)</li>
<li>We define how to trigger them (accurate match vs. model-generated)</li>
</ul>
<p>In my example, I created a set of skill schemas. It’s a bit old-fashioned compared to plain-text skills I’ll cover later, but they do the same thing at their core.</p>
<p>The full execution loop looks like this:</p>
<pre><code>LLM reads local env
  → finds function it can use
  → understands the task
  → does it
  → validates and responds to user
  → user annotates (correct / incorrect)
  → agent learns from execution
</code></pre>
<p>The abstraction isn’t that different from humans: fail more, learn more. And eventually there will be an “aha moment.”</p>
<h2>Distillation is the Real Secret</h2>
<p>I almost forgot to mention the most important thing: <strong>“learn from other people’s success or failure”</strong> is the best way to describe what good orchestration enables.</p>
<p>When you capture agent execution logs — what it tried, whether it worked, what the user annotated — you have a distillation dataset. That’s exactly what powerful models are trained on: human-annotated traces of good decisions.</p>
<h2>Keep It Simple</h2>
<p>Data is king. Keep the flow simple but logical. Let the agent figure out the best way to do things — don’t over-engineer the orchestration layer.</p>
<p>A complex orchestration system you built becomes a constraint the agent has to work around. A simple harness the agent can reason about is a tool the agent can use.</p>
<h2>GitHub</h2>
<p>Full code at <a href="https://github.com/Czhang0727/agent-from-scratch">github.com/Czhang0727/agent-from-scratch</a>.</p>
<hr />
<p><em>Part 3 covers skills — the user manuals that tell your agent which tools to use and when.</em></p>
]]></content>
        <author>
            <name>Chenyi Zhang</name>
            <email>chenyi@character.ai</email>
            <uri>https://blog.chenyi.ai/</uri>
        </author>
        <category label="Agent from Scratch"/>
        <category label="AI Agents"/>
        <category label="LLM"/>
        <category label="Orchestration"/>
        <category label="Prompt Engineering"/>
        <published>2026-05-01T00:00:00.000Z</published>
    </entry>
    <entry>
        <title type="html"><![CDATA[Agent from Scratch Part 1: IO]]></title>
        <id>https://blog.chenyi.ai/posts/part-1-io/</id>
        <link href="https://blog.chenyi.ai/posts/part-1-io/"/>
        <updated>2026-04-10T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Building an agent starts with one question: how does it talk to the world? Text in, text out — and everything else is just a plugin.]]></summary>
        <content type="html"><![CDATA[<img src="https://blog.chenyi.ai/images/agent-io-diagram.png" alt="Agent from Scratch Part 1: IO" style="border-radius: 1rem; margin-bottom: 1rem; width: 100%; object-fit: cover;" /><blockquote>
<p>I lost my wisdom teeth today, so let’s make it simple…</p>
</blockquote>
<h2>What is IO for an Agent?</h2>
<p>IO defines how your agent system can explore or communicate with its external environment.</p>
<p>It’s not a real human — it won’t see, smell, or feel. Anything going <strong>in</strong> to the agent, and anything coming <strong>out</strong>, is plain bits.</p>
<p>Here’s the bare minimum IO you need for an agent:</p>
<ol>
<li>Text input</li>
<li>Text output</li>
</ol>
<p>That’s it. You can build a lot with just that.</p>
<h2>Processors: Not Just Neural Nets</h2>
<p>Before machine learning took over, processors were rule-based. Believe it or not, these systems still run today — when you call your bank and hear “Press 1 for balance, Press 2 for transfers,” that’s a rule-based agent. I’ll cover that section later. For now, let’s focus on IO.</p>
<h2>Multimodal: Making IO Cooler</h2>
<p>Want to go beyond text? “Multimodal support” just means your IO bus handles more data types. Video, image, voice — these are already solved problems:</p>
<ol>
<li>Image viewer</li>
<li>Video player</li>
<li>MP3 player</li>
<li>Microphone input drivers</li>
<li>Image transformer</li>
</ol>
<p>None of these are new. They’ve been around for decades, and they perfectly meet agent needs. The trick is making your IO bus <strong>generalized</strong> — built to accept more input types via plugins over time.</p>
<p>Think about where this goes: agents will soon have physical bodies. IoT sensors will feed into the same IO bus. The abstraction that handles voice today will handle temperature sensors tomorrow.</p>
<h2>Design Principle: Generalize Your IO Bus</h2>
<p>Don’t hardcode IO types. Build a plugin-friendly bus where new input/output channels can be added without touching core agent logic.</p>
<p><img src="https://blog.chenyi.ai/images/agent-io-diagram.png" alt="Agent IO architecture — External Input flows into Agent, which connects to Memory, External Tools, and Guidelines, then outputs via Agent Output" /></p>
<p>Your agent’s intelligence lives in the middle. The IO bus is just plumbing — but design it well and you only build it once.</p>
<h2>GitHub</h2>
<p>Full implementation at <a href="https://github.com/Czhang0727/agent-from-scratch">github.com/Czhang0727/agent-from-scratch</a>.</p>
<hr />
<p><em>Part 2 covers orchestration — once IO is hooked up, how do you get the model to actually do things?</em></p>
]]></content>
        <author>
            <name>Chenyi Zhang</name>
            <email>chenyi@character.ai</email>
            <uri>https://blog.chenyi.ai/</uri>
        </author>
        <category label="Agent from Scratch"/>
        <category label="AI Agents"/>
        <category label="LLM"/>
        <category label="Agent Design"/>
        <category label="IO"/>
        <published>2026-04-10T00:00:00.000Z</published>
    </entry>
    <entry>
        <title type="html"><![CDATA[Agent from Scratch Part 0: What Is an Agent?]]></title>
        <id>https://blog.chenyi.ai/posts/part-0-overview/</id>
        <link href="https://blog.chenyi.ai/posts/part-0-overview/"/>
        <updated>2026-04-01T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Starting from first principles — an agent is just a workflow that thinks like a human. Here's the 10,000ft view before we build it.]]></summary>
        <content type="html"><![CDATA[<img src="https://blog.chenyi.ai/images/agent-io-diagram.png" alt="Agent from Scratch Part 0: What Is an Agent?" style="border-radius: 1rem; margin-bottom: 1rem; width: 100%; object-fit: cover;" /><p>I’m starting to build a general agent framework from scratch, sharing what I’ve learned over the past few years. Let’s start from the very beginning.</p>
<h2>What Is an Agent?</h2>
<p>IMO, an agent is a <strong>workflow that can think like a human</strong> — do what a human can do. That concept existed even before LLMs, when we had stateful agents in backend system design.</p>
<p>The only reason “agents” are popular now is Large Models. We finally found a moment when agent design could be generalized — not hand-crafted for each narrow task.</p>
<h2>The 10,000ft View: An Agent Is a PC</h2>
<p>Back to old-fashioned computing: we have IO, a CPU, and storage.</p>
<p>An agent maps almost perfectly:</p>
<ul>
<li><strong>CPU</strong> → LLM</li>
<li><strong>IO</strong> → connector to external devices (tools, APIs, sensors)</li>
<li><strong>Storage</strong> → memory</li>
</ul>
<p>Yep, it’s that simple.</p>
<p><img src="https://blog.chenyi.ai/images/agent-io-diagram.png" alt="Agent architecture — External Input flows into Agent, which connects to Memory, External Tools, and Guidelines, producing Agent Output" /></p>
<p>Over time, engineers added fancy stuff to make each component faster:</p>
<ul>
<li>Better CPU → better models</li>
<li>Larger bandwidth → larger context windows</li>
<li>More applications → more skills / MCP servers</li>
</ul>
<p><strong>Nothing fundamentally changed.</strong></p>
<h2>The Agent Heartbeat</h2>
<p>Here’s the fake code of agent orchestration — if you know how OpenClaw works, this is pretty much the heartbeat:</p>
<pre><code class="language-python">while True:
    sleep(1000)
    input = read_input(context)
    intent_and_plan = think(context, input)
    execution_result = do(context, intent_and_plan)
    # this phase can be async sometime
    evaluation(context, execution_result)
</code></pre>
<p>Simple loop: read, think, do, evaluate. Repeat.</p>
<h2>The Event-Driven Upgrade</h2>
<p>There’s a known problem with <code>sleep</code> — wasting resources waiting. The solution? Event-driven, just like JavaScript.</p>
<p>Claude Code’s internals indicate they’re doing the same thing. So the loop evolves:</p>
<p><strong>User interaction side:</strong></p>
<pre><code class="language-python">pub_sub_client = PubSubClient()

input = read_user_input()
pub_sub_client.send(topic="user_input", input)
result = pub_sub_client.subscript(topic="task_result")
</code></pre>
<p><strong>Consumer (agent) side:</strong></p>
<pre><code class="language-python">user_input = pub_sub_client.subscript(topic="user_input")
intent_and_plan = think(context, input)
execution_result = do(context, intent_and_plan)
pub_sub_client.send(topic="task_result", execution_result)
# this phase can be async sometime
evaluation(context, execution_result)
</code></pre>
<p>Clean decoupling. The agent becomes a proper event consumer.</p>
<h2>What’s Coming</h2>
<p>In this series, I’ll dig deeper into each component:</p>
<ul>
<li><strong>IO</strong> — how the agent talks to the world</li>
<li><strong>Orchestration</strong> — prompt engineering and the harness system</li>
<li><strong>Skills</strong> — user manuals for tools</li>
<li><strong>Memory</strong> — expanding the context window</li>
<li><strong>Multi-agent</strong> — when one agent isn’t enough</li>
</ul>
<p>Track progress and raise issues / PRs at <a href="https://github.com/Czhang0727/agent-from-scratch">github.com/Czhang0727/agent-from-scratch</a>.</p>
]]></content>
        <author>
            <name>Chenyi Zhang</name>
            <email>chenyi@character.ai</email>
            <uri>https://blog.chenyi.ai/</uri>
        </author>
        <category label="AI Agents"/>
        <category label="LLM"/>
        <category label="Agent Design"/>
        <category label="Agent from Scratch"/>
        <published>2026-04-01T00:00:00.000Z</published>
    </entry>
</feed>