pi: The Minimal Coding Agent - Writing

pi is the most interesting bet in the agent CLI category right now. Mario Zechner built it after souring on the existing tools, and the argument for it is best made in his two recent talks, both worth watching in full:

I Hated Every Coding Agent, So I Built My Own. The case against the existing harnesses and the design rationale for pi.
Building pi in a World of Slop. pi’s design plus a broader argument about what agents are doing to codebases.

What is it

pi is roughly four packages: a provider abstraction, an agent loop, a custom terminal UI Mario claims is around six hundred lines, and the coding agent itself. The system prompt fits on a slide. The tool set is read, write, edit, and bash. There are no built-in helpers for MCP, sub-agents, plan mode, or todos. Anything along those lines is a TypeScript extension you write (or ask the agent to write) and hot-reload into the running session.

pi ships its own developer docs and a folder of example extensions inside the source tree. When you ask the agent to add a feature (a sub-agent system, a custom compaction strategy, a new slash command), it reads those docs and examples and writes the extension itself.

The case against existing harnesses

Mario’s argument against Claude Code (at 5:00) is that it stopped being predictable, and the leak backs him up: the orchestration is mostly prompt strings, which can change between any two releases. My own experience matches. Claude Code’s quality varies day to day in ways the version number doesn’t explain, and a workflow that worked yesterday can quietly start producing worse output today.

He’s harder on opencode (at 11:42), citing compaction logic that he says breaks the prompt cache and LSP feedback that interrupts the model mid-refactor.

The point that stuck with me is the TerminalBench thesis (at 16:07). The benchmark’s own reference harness, Terminus, scores near the top of the leaderboard with nothing but a tmux session and keystroke I/O. If that little tooling can compete, the question worth asking about every other harness is what the rest of it is buying.

The argument about codebases

The second talk steps back from pi and makes a broader argument about what agents are doing to projects. The framing he calls “compounding boooos” (at 12:59) is that agents introduce small mistakes faster than humans can catch them, and unlike humans they feel no pain, so they keep going. Review agents help but don’t solve it.

The closing section (at 16:13) is an argument for slowing down: write important code by hand, use agents for boring scoped tasks, read the code they generate. Worth watching even for engineers who plan to stay on Claude Code.

How pi is built

A few of the design choices that follow from the minimalism:

The system prompt is small enough to fit on a slide. Frontier models are already trained to know what a coding agent is; the prompt does not need to teach them.
pi runs without approval dialogues by default. The intended safety answer is containerisation, not per-action prompts.
Sessions are trees rather than linear chats. You can branch off the main thread, summarise a sub-thread, and bring just the summary back.
Extensions are TypeScript files that hot-reload during a session, so the agent can be asked to modify its own behaviour and the change takes effect immediately.

What I want to test

The thing I most want to find out is how pi’s extension model compares to Claude Code’s. CLAUDE.md plus skills plus hooks has been the most useful part of my setup over the past year. It’s where I encode commit conventions, deployment rules, the way I want PRs reviewed, the documents I want generated automatically. Those workflows have changed how I work day to day, and a lot of the value of staying on Claude Code is in those files rather than in the model itself.

So the question for pi isn’t whether the four-tool core works on small tasks. It’s whether the TypeScript-extension model lets me build the same kind of personal workflow scaffolding, and whether the fact that the agent itself can write and reload those extensions makes the iteration loop tighter than editing CLAUDE.md and hoping the next session picks it up.

Other things to find out when I install pi:

How small is the system prompt actually, and does the model behave well with that little scaffolding?
How does the custom TUI compare to Claude Code on day-to-day feel?
Does pi work with non-frontier models? The final post pairs it with local Gemma.