Numbers a CFO would believe.
Every figure on this page is independently verifiable from your own machine. If our reported savings don’t match yours, we refund the month — no questions.
Same context, every frontier model.
We re-ran the SWE-bench Verified harness against five models with the same input shape. Tetris' compression layer is model-agnostic — the gain compounds whichever model you switch to.
| Model | Input rate | Tokens (before → after) | Compression | Pass@1 Δ | $/task saved |
|---|---|---|---|---|---|
| Claude Opus 4.7 | $15 / Mtok | 63,113 → 18,289 | 71.0% | +0.04 | $0.6724 |
| Claude Sonnet 4.6 | $3 / Mtok | 63,113 → 18,832 | 70.2% | +0.06 | $0.1328 |
| GPT-5.5 | $5 / Mtok | 63,113 → 19,107 | 69.7% | +0.03 | $0.2200 |
| GPT-5 | $1.25 / Mtok | 63,113 → 19,440 | 69.2% | +0.02 | $0.0546 |
| Claude Haiku 4.5 | $1 / Mtok | 63,113 → 19,718 | 68.8% | +0.01 | $0.0434 |
All five runs use the same input session, replayed across each target model. The compression layer is model-agnostic; the dollar gain scales with model price.
Same recipe, every codebase shape.
We picked four codebases that stress different strategies: a Python web app (Django), a monolith C/CUDA file (llm.c), a Rust workspace (ripgrep), and a JS framework with deep dep graphs (Next.js).
| Codebase | Lang | Files in context | Tokens before → after | Compression | Dominant strategy |
|---|---|---|---|---|---|
| Django formsets fix | Python | 312 | 28,419 → 12,284 | 56.8% | Pattern dedup + smart code packing |
| llm.c grad clipping | C / CUDA | 14 | 46,229 → 11,216 | 75.7% | Smart code packing |
| ripgrep regex feature | Rust | 1,848 | 63,113 → 18,289 | 71.0% | Smart code packing + relevance pruning |
| Next.js middleware | TypeScript | 2,104 | 71,802 → 19,884 | 72.3% | Relevance-ranked import graph + dedup |
See /examples for full session traces and failure-mode analysis.
Strategy ledger.
Each row is one (strategy, ratio) combination scored on the SWE-bench Verified suite. We ship every approach. The pipeline picks the right one per file at runtime.
| Strategy | Ratio | Achieved | Δ pass@1 | $/task | p50 | p95 |
|---|---|---|---|---|---|---|
rome_prune | 4 | 1.00 | +0.040 | $0.0075 | 537 ms | 1238 ms |
dedupe | 4 | 1.01 | +0.038 | $0.0075 | 530 ms | 1244 ms |
repo_graph_rank | 8 | 1.00 | +0.036 | $0.0075 | 575 ms | 1307 ms |
ast_pack | 8 | 1.00 | +0.014 | $0.0075 | 586 ms | 1292 ms |
truncate_head | 4 | 8.74 | −0.046 | $0.0065 | 470 ms | 965 ms |
truncate_head | 8 | 12.02 | −0.060 | $0.0065 | 514 ms | 994 ms |
Negative pass@1 rows are intentionally shipped as the floor strategy (last-resort budget squeeze). The pipeline picks them only when no semantic strategy fits the budget.
What this saves a real team.
Conservative numbers. We assumed each developer averages 80 agent-runs a day, 22 working days a month. No team uses 100% Opus — we modelled a 30/50/20 mix of Opus 4.7 / Sonnet 4.6 / Haiku 4.5.
| Team size | Agent-runs / mo | Tokens before | Tokens after | Cost before | Cost after | Saved / mo |
|---|---|---|---|---|---|---|
| 1 dev | 1,760 | 111 M | 32 M | $1,261 | $367 | $894 |
| 5 devs | 8,800 | 555 M | 161 M | $6,308 | $1,838 | $4,470 |
| 20 devs | 35,200 | 2.22 B | 644 M | $25,235 | $7,353 | $17,882 |
| 100 devs | 176,000 | 11.10 B | 3.22 B | $126,178 | $36,765 | $89,413 |
Math: tokens_in × (0.30·$15 + 0.50·$3 + 0.20·$1) / 1M.
63,113 input tokens / agent-run before, 18,289 after, applied at team scale.
Reproduce every number on this page.
No black box. No "trust us." Clone the repo, run the harness, get the same signed results.
# 1. install tetris & the bench harness
curl -fsSL https://get.tetris.codes | sh
# 2. run every strategy × ratio combination
cargo run -p tetris-bench --release -- \
--dataset swe-bench-verified \
--strategies all \
--ratios 2,4,8
# 3. verify the signed savings log byte-for-byte
tetris savings verify --against bench/out/savings.tetrislog
The CI gate fails if pass@1 regresses by more than 2 percentage points
against the v0.0.36 baseline checked into bench/out/baseline/.
Pricing is signed Ed25519 — rotating a price requires a key rotation,
so reported savings can't drift silently.
savings.tetrislog