Tokenizers Don't Care About Your Abbreviations - Writing

Shorter variable names should mean fewer tokens. ord is shorter than order, dc is shorter than discount. Fewer characters, fewer tokens, lower cost.

ilo had an entire naming convention built around this. Documented rules for truncation, vowel-dropping, initial extraction.

Counting actual tokens with the tokenizer produced different numbers.

The surprise

Here are some common words and their token counts in cl100k_base (the tokenizer used by GPT-4 and Claude):

order     → 1 token
ord       → 1 token
discount  → 1 token
dc        → 1 token
customer  → 1 token
cs        → 1 token
amount    → 1 token
amt       → 1 token
payment   → 1 token
pmt       → 1 token

Every common English word is already a single token, so abbreviating it saves nothing. The tokenizer was trained on English text and these words are in its vocabulary.

It gets worse. Some abbreviations actually cost more:

environment  → 1 token
env          → 1 token
verification → 2 tokens (verific + ation)
vrf          → 1 token  (saves here, but rare case)

Most of the time, abbreviation is a net zero. Occasionally it helps with longer compound words. But the common case, the names you use most, shows no difference.

The data

This is why idea8-ultra-dense (285 tokens, short names) and idea9-ultra-dense-short (287 tokens, short names + other tweaks) have nearly identical token counts despite idea9 being 114 characters shorter. The token count barely moved. Only the character count changed:

idea8:  285 tokens,  901 chars
idea9:  287 tokens,  787 chars  (2 more tokens, 114 fewer chars)

Two more tokens, 114 fewer characters. The naming convention saved characters but cost tokens (probably because unusual abbreviations occasionally split weirdly).

What saves tokens

The big wins are structural, not cosmetic:

Positional arguments. reserve(items:items) → reserve items eliminates parens, colons, and the repeated parameter name. This is the single biggest token saver across all nine syntax variants.

Implicit last-result matching. x=call(arg);match x{err e:handle(e)} → call arg;?{!e:handle e}. No intermediate variable binding needed when you immediately match.

Dropping keywords. let x = expr → x=expr. The let keyword costs a token every time and adds zero information when the syntax is unambiguous.

These structural changes moved the needle from 1.06x Python (idea1) to 0.33x Python (idea8). The naming convention moved it from 0.33x to… 0.33x.

Why this happens

Tokenizers are trained on billions of words of text. Common English words, the ones you’d naturally use as variable names, are already encoded as a single token, so abbreviating them rarely helps and sometimes splits the abbreviation into more tokens than the original.

The real savings come from reducing the structure of your code: fewer delimiters, fewer keywords, fewer redundant names. That’s where the 3x token reduction lives. Not in replacing customer with cs.

I still use the short-naming convention in ilo because it saves characters (which affects context window usage in a different way), but the token-count numbers above show it doesn’t save tokens.