Loading ilo, Part 3: Harness Layer - Writing

The skill from Part 2 works in any coding tool that reads the Agent Skills format: Claude Code, Codex, pi, OpenCode, and others. Two gaps remain. Every user has to install the skill themselves, and once installed it only loads when the agent decides ilo is relevant for the current task.

A fork at the harness layer (the part of the tool that builds the prompt sent to the model) closes both gaps. The spec is built into the tool, and it goes into every prompt automatically. Asking everyone to swap their coding agent for my fork isn’t realistic, so this isn’t how I’d ship it; the system-message paste and the skill stay the practical paths. The forks are an experiment: how much of an “ilo-first tool” is the fork itself, versus hooks the tool already exposes? I tried it on two tools that sit at opposite ends of that scale.

Warp: agentic terminal

Warp is Warp Inc’s agentic terminal, open-sourced this year. The fork lives at danieljohnmorris/warp-local.

Warp builds a system prompt for each turn and hands it to whichever coding agent you’ve picked (Claude, Codex, Gemini). The change that makes Warp ilo-first is small: my fork sticks the ilo prompt onto the front of that system prompt before it leaves the terminal.

The rest of the fork exists because the stock build won’t run that injection without changes to surrounding code. AI features sit behind a cloud login, the model picker only knows Warp’s hosted models, and prompt building happens on Warp’s server rather than locally. Before the ilo prepend can run on a user’s machine, the fork has to bypass the login, replace the server-side prompt builder with a local one, swap the model menu for local options (Claude, GPT-5.5 variants, Ollama), and tidy up cloud-only UI that no longer fits offline.

Once it’s installed, two environment variables (one turns on local mode, one carries the ilo prompt text) make every Warp window ilo-aware. Switching between models is a menu pick.

pi: agentic coding CLI

pi is open source. The fork lives at danieljohnmorris/pi-mono-ilo.

pi already supports Agent Skills natively and reads AGENTS.md files, so the on-demand path from Part 2 works without a fork. Drop a skill or a markdown file next to the project and the agent loads it. What skills can’t do is the always-on bias: a “prefer ilo for new code generation” instruction that fires every turn whether or not the user mentions ilo.

The fork adds that piece. One environment variable turns it on; an optional second variable replaces the default prompt text. The implementation is one new file plus three lines wired into the existing prompt builder. With the variable unset, pi behaves exactly like the stock build.

The whole pi fork is under a hundred lines of TypeScript.

What the comparison shows

The cost of a harness-layer fork depends on what the tool already supports. Warp had close to zero hooks for this, so the fork did everything: cloud bypass, prompt injection, model menu, UI cleanup. pi already had Agent Skills, so the fork only had to add the always-on bias.

Cursor, Gemini CLI, and Copilot have all moved toward Agent Skills or an equivalent. As that spreads, the harness-layer fork shrinks per tool, until “load ilo into tool X” becomes the same job as publishing a skill.

What I want is for none of this to matter. Once foundation models train on a stable ilo, an agent already knows the language without a paste, a skill, or a prompt prefix. The three Loading ilo posts record where the language sits today.

The open question is whether harness-layer ilo measurably reduces tokens-per-task on a real coding session.