How we measure
Savings are a per-session record. Signed. Auditable. Reproducible on your machine.
1. Unit of measurement
Every prompt routed through tetris is one compression session.
Each produces a CompressionTrace:
tokens_before— what the assistant was about to send.tokens_after— what we sent.cost_before_cents—tokens_before × model_input_cost.cost_after_cents—tokens_after × model_input_cost.latency_ms— wall-clock tetris took.decisions[]— every node kept, dropped, or merged, with reasons.
Session savings = cost_before − cost_after. Lifetime = sum.
2. Tokenized by the destination model
The tokenizer is the one the target model uses. Not a proxy.
| Model | Tokenizer |
|---|---|
| Claude Sonnet 4.5, Opus 4 | Anthropic claude |
| GPT-5, GPT-4o | OpenAI cl100k_base |
| Gemini Pro 2.5 | Google gemini |
| Other | cl100k_base fallback (flagged in trace) |
Post-compression token counts use the same tokenizer. Dual-counting against
cl100k_base is written to the trace so you can detect drift between releases.
3. Prices from the provider, not us
Input prices pinned per-release in pricing.json, from each provider's
public page. Historical sessions priced at the rate in effect when they ran.
4. The trace is the evidence
Every session is signed by your binary and appended to savings.log
under ~/.tetris/. Dump or verify it:
tetris savings --since 2026-01-01 --format json
tetris savings verify # replay traces + re-check signatures
verify re-computes tokens_after from the signed trace and
compares. Discrepancies print the session ID. It has never found one.
5. What counts, what doesn't
- Sessions that grew tokens are recorded, counted as
0. - Sessions whose pass@1 eval failed (bench runs) count as
0. - Cached prompt tokens priced at cached rates after first call.
We do not inflate by charging full rate for cached tokens or assuming any kept token would have been sent twice.
6. The honest bit
If your session fits in 32k tokens, we save you nothing.
Real savings come from large contexts — long refactors, repo-level reviews,
monorepo PRs. Trivial chats round to zero and the TUI shows —.
7. Front-page benchmark
The Pareto chart is the last successful CI run of our internal bench harness on SWE-bench Verified.
tetris bench --suite swe-bench-verified --ratio 8
tetris bench --suite swe-bench-verified --ratio 8 --compressor llmlingua-2
Not a hand-drawn artist's impression. CSV + SVG land in out/.
8. Dispute?
Email savings@tetris.codes with a session ID
from tetris savings list. We confirm, find the bug, or refund the Pro month.
All three have happened.