Cranelift vs LLVM for a Hobby JIT: Why I Chose Cranelift - Writing

After writing a hand-rolled ARM64 JIT for ilo, the next question was obvious: what about a real compiler framework? The hand-rolled backend is fast and small, but it’s aarch64-only and will get painful as the instruction set grows.

I built the same JIT backend twice - once with Cranelift, once with LLVM via inkwell. Both take the same input (ilo’s register VM bytecode), both emit the same output (native machine code for the host platform). The code is structurally identical. The experience of building and shipping them was not.

The task

Both backends do the same thing: walk the bytecode, emit one IR instruction per opcode, let the framework handle register allocation and instruction selection.

For tot(10, 20, 30) - three multiplies and an add - the translation is mechanical:

VM bytecode:       MUL_NN R3,R0,R1 | MUL_NN R4,R3,R2 | ADD_NN R5,R3,R4 | RET R5
Cranelift IR:      fmul v3,v0,v1   | fmul v4,v3,v2   | fadd v5,v3,v4   | return v5
LLVM IR:           fmul %3,%0,%1   | fmul %4,%3,%2    | fadd %5,%3,%4   | ret %5
Native (ARM64):    fmul d3,d0,d1   | fmul d4,d3,d2   | fadd d0,d3,d4   | ret

Both frameworks produce the same 4 native instructions and run tot in 2ns.

Cranelift: 248 lines

Cranelift is written in pure Rust, built by the Wasmtime team. The API maps cleanly to what a bytecode compiler needs: declare variables, emit instructions, let the framework figure out the rest.

// Cranelift: one opcode → one IR instruction
OP_MUL_NN => {
    let bv = builder.use_var(vars[b]);
    let cv = builder.use_var(vars[c]);
    let result = builder.ins().fmul(bv, cv);
    builder.def_var(vars[a], result);
}

Setup is a JITBuilder with a signature describing the function’s parameter and return types. Variables are declared upfront, mapped 1:1 from VM registers. The framework handles SSA construction, register allocation, and instruction selection.

let mut flag_builder = settings::builder();
flag_builder.set("opt_level", "speed").ok()?;
let isa = cranelift_native::builder().ok()?
    .finish(settings::Flags::new(flag_builder)).ok()?;
let mut module = JITModule::new(JITBuilder::with_isa(isa, default_libcall_names()));

Building is cargo build --features cranelift. No system dependencies. It compiles on any platform Cranelift supports (ARM64, x86_64). Total backend size: 248 lines.

LLVM: 208 lines

LLVM via inkwell follows the same pattern. The translation loop is nearly identical:

// LLVM: one opcode → one IR instruction
OP_MUL_NN => {
    let result = builder.build_float_mul(regs[b], regs[c], "mul").ok()?;
    regs[a] = result;
}

Shorter line count because LLVM’s builder is slightly more compact - you pass values directly rather than declaring variables. But the real difference is everything around the code.

Building requires LLVM 18 installed on the system. On macOS that’s brew install llvm@18 plus setting LLVM_SYS_180_PREFIX. On CI it means adding LLVM to every build environment. The inkwell crate pins to a specific LLVM major version, so upgrading LLVM means upgrading inkwell means checking for API breakage.

// LLVM memory management: deliberately leak the execution engine
let engine = module.create_jit_execution_engine(OptimizationLevel::Aggressive).ok()?;
let func_ptr = engine.get_function_address("jit_func").ok()? as *const u8;
std::mem::forget(engine);  // keep the code alive

The execution engine owns the compiled code. If it gets dropped, the function pointer becomes invalid. For a one-shot JIT there’s no clean way around this - you leak the engine and keep the context alive. It works.

The comparison

For ilo’s current instruction set (pure-numeric arithmetic), both produce identical output. The tot benchmark runs at 2ns with either backend.

Custom JIT (arm64)    2ns    (351 lines, aarch64 only)
Cranelift JIT         2ns    (248 lines, cross-platform)
LLVM JIT              2ns    (208 lines, requires LLVM 18)

Where LLVM’s optimiser would pull ahead is with more complex code - loops, branches, function inlining, vectorisation. For straight-line floating-point arithmetic, there’s nothing to optimise. Both frameworks emit the obvious instructions.

Why Cranelift won

Zero system dependencies. cargo build --features cranelift works on a fresh checkout. No brew install, no LLVM_SYS environment variables, no CI matrix entries per platform. This matters more than anything else for a project with one contributor.

Clean memory model. Cranelift’s JITModule owns the compiled code. Keep the module alive, the function pointer is valid. Drop the module, the code is freed. No leaking, no forget, no dangling pointer anxiety.

Fast compilation. Cranelift is designed for JIT workloads - compilation speed matters. LLVM’s -O2 pipeline is heavier and slower to compile, which doesn’t matter for tot (4 instructions) but would matter if ilo starts JIT-compiling functions at startup.

Cross-platform for free. The same code runs on ARM64 and x86_64. The hand-rolled JIT is locked to Apple Silicon. LLVM is also cross-platform, but with the system dependency cost.

Where LLVM would win. If ilo’s JIT grew to handle loops with induction variables, or needed auto-vectorisation, or had to optimise across function boundaries - LLVM’s optimiser would justify the dependency. For straight-line arithmetic, it’s overhead with no payoff.

The LLVM backend stays behind a feature flag. It’s useful for validation (does Cranelift produce the same result as LLVM?) and as a benchmark ceiling. The default path is Cranelift.

The hand-rolled JIT still exists

For the current instruction set, the hand-rolled ARM64 backend is the simplest option on Apple Silicon: zero dependencies, zero framework overhead, 351 lines. It’s the default when running on aarch64 without the cranelift feature flag.

As the instruction set grows - branching, loops, function calls - the hand-rolled approach will hit a wall. Cranelift’s SSA model and block-based IR are designed for exactly that. The plan is to keep the hand-rolled path for pure-numeric cases and switch to Cranelift once branching and loops are in the instruction set.