A zero-allocation Go decimal library with no big.Int fallback — 128-bit fixed-point arithmetic that benchstats ~35% faster than the fastest existing library, with exact alloc counts and overflow correctness enforced by the test suite.
The fun part was making it fast: every division by a power of ten (rescaling, rounding, formatting) is done with precomputed multiply-high reciprocals instead of hardware DIV — the Möller–Granlund 2-by-1 trick plus magic constants, all generated and re-proven against big.Int. On my M1 Pro it benchstats ~35% faster (geomean) than udecimal, the fastest library I could find, and ~9x faster than shopspring.
How: Claude Code with Ultracode + Fable 5 for 12hours