Ongoing

ilo: A Programming Language for AI Agents, Not Humans

Creator · 2026 · 4 min read

A performant language designed for LLMs to write. Optimised for token and character length, and rethought for non-human use cases.

Overview

ilo is a language that AI agents write code in. Not a framework for building agents. A compile target - the language an LLM outputs when it needs to express a program as cheaply and correctly as possible. The only metric is total tokens from intent to working code: spec loading + generation + context loading + error feedback + retries.

Problem

When an LLM writes Python, it wastes tokens on verbose syntax, ambiguous grammar causes retries, and human-readable formatting burns context window. Every token spent generating, reading, or retrying costs real time and money. Agents need a language optimised for them to write in, not for humans to read.

Constraints

  • Must be learnable by an LLM from a short spec alone (until foundation models train on the language natively)
  • Must achieve lower token cost than Python for equivalent programs
  • Must maintain high generation accuracy (10/10) across diverse task types

Approach

I built nine syntax variants and benchmarked them all against Claude Haiku. Measured token count, character count, and cold-LLM generation accuracy - can Haiku write correct ilo programs from just a spec? Let the data pick the winner, then built a full VM in Rust. Four execution backends now: tree-walking interpreter, register VM, a hand-rolled ARM64 JIT, and Cranelift JIT (the default, with interpreter fallback).

Key Decisions

Prefix notation instead of infix

(a * b) + c  →  +*a b c

Eliminates parentheses at every nesting level. Across 25 expression patterns: 22% fewer tokens, 42% fewer characters vs infix.

Positional arguments instead of named parameters

tot p:n q:n r:n>n;s=*p q;t=*s r;+s t

Eliminates parens, colons, and repeated parameter names. Single largest token reduction across all variants. I expected parameter-swap errors but they never materialised - 10/10 accuracy across all task types.

Single-character sigils instead of English keywords

?<x 0 !neg ~pos    - ? conditional, ! effect, ~ transform

Sigils can't be confused with variable names or hallucinated into natural-language variations.

Static verifier before execution

verify: undefined variable 'y' in 'f'
  hint: did you mean 'x'?

All calls resolve, all types align, all dependencies exist - checked before running anything. Reports all errors at once with did-you-mean hints. Catches malformed programs before execution, cutting retry cycles.

NaN boxing for value representation

Every value fits in 8 bytes. Numbers are zero-cost (just raw double bits). The stack becomes Vec<u64> with contiguous memory and no heap chasing.

Register-based VM instead of stack-based

Reduced instruction count by 67% and improved performance by 31%. Fewer dispatches matters more than simpler instructions.

Tech Stack

Rust ilo Language Design AI Agents JIT

Result & Impact

  • 0.33x Python (287 tokens vs 871)
    Token Efficiency
  • 0.22x Python (787 chars vs 3635)
    Character Efficiency
  • 10/10 across 4 task types
    LLM Generation Accuracy
  • 83ns/call register VM, 2ns/call JIT (tot benchmark)
    VM Performance

An LLM given only the spec writes correct ilo programs with no prior training. Tested across workflow, data pipeline, decision, and API orchestration tasks. Caveat: the current instruction set is small (arithmetic, matching, basic control flow). These numbers reflect a simple benchmark. As the vocabulary and instruction set expand, performance characteristics will change.

Learnings

  • Positional arguments are the single biggest token saver. I expected parameter-swap errors but they never appeared. 10/10 accuracy across all task types.
  • Prefix notation compounds savings at every nesting level. The deeper the expression, the more tokens saved.
  • Abbreviations don't save tokens. Most tokenisers already encode common English words as a single token. What does cost tokens: hyphens. A hyphenated name is always 2 tokens because the hyphen forces a split.
  • Spec quality matters more than syntax cleverness. Better examples in the spec moved accuracy from 8/10 to 10/10.
  • Reporting all verification errors at once with hints is cheaper than letting agents discover them through execution retries.