The programming languages I’ve worked in were designed for people. The syntax, the error messages, the indentation, all of it optimised for a brain that reads left-to-right and cares about whether code “looks nice.”
AI agents don’t care about any of that. They produce tokens sequentially and consume them from a finite context window, and every token spent on generating, reading, or retrying costs real time and real money.
ilo is the language I’ve been building to test what happens when the human-centric assumptions are dropped.
The metric that changes everything
I settled on a single metric: total tokens from intent to working code.
Total cost = spec loading + generation + context loading + error feedback + retries
That’s it. Every design decision gets evaluated against this number. If a feature reduces it, it’s in. If it increases it, no matter how “elegant” or “readable”, it’s out.
This is an unusual way to design a language. You end up making decisions that would get pushback in a PL design forum.
The name
I called it ilo, which is Toki Pona for “tool.” Toki Pona is a constructed language built around radical minimalism. About 120 words, 14 phonemes. Complex ideas expressed by combining simple terms. It constrains human expression to force clarity of thought.
ilo does the same for machine programmers.
What makes it different
A few principles drive the design:
Token-conservative. Not just “short syntax”, short end-to-end. A terse syntax that causes more retries is worse than a verbose one that works first time.
Constrained. Small vocabulary, closed world, one way to do things. When an agent generates the next token, fewer valid options means fewer wrong choices means fewer retries. The language becomes a set of rails.
Self-contained. Each unit carries its own context: dependencies, types, rules. An agent working on function A shouldn’t need to load functions B through Z.
Language-agnostic. All English keywords are replaced with single-character sigils. ? for match, ! for error, ~ for ok, @ for iteration. Agents learned the sigil set from a short spec with 10/10 accuracy. Structural tokens outperform English keywords because they can’t be confused with variable names.
Where the experiments stand
Too early to draw a firm conclusion, but the experiments so far have been informative. The winning syntax variant uses 0.33x the tokens of Python while maintaining perfect generation accuracy across four diverse task types.
Whether that translates into a real-world tool or remains a research curiosity is the question I’m trying to answer.