How we measure

Savings are a per-session record. Signed. Auditable. Reproducible on your machine.

1. Unit of measurement

Every prompt routed through tetris is one compression session. Each produces a CompressionTrace:

tokens_before — what the assistant was about to send.
tokens_after — what we sent.
cost_before_cents — tokens_before × model_input_cost.
cost_after_cents — tokens_after × model_input_cost.
latency_ms — wall-clock tetris took.
decisions[] — every node kept, dropped, or merged, with reasons.

Session savings = cost_before − cost_after. Lifetime = sum.

2. Tokenized by the destination model

The tokenizer is the one the target model uses. Not a proxy.

Model	Tokenizer
Claude Sonnet 4.5, Opus 4	Anthropic `claude`
GPT-5, GPT-4o	OpenAI `cl100k_base`
Gemini Pro 2.5	Google `gemini`
Other	`cl100k_base` fallback (flagged in trace)

Post-compression token counts use the same tokenizer. Dual-counting against cl100k_base is written to the trace so you can detect drift between releases.

3. Prices from the provider, not us

Input prices pinned per-release in pricing.json, from each provider's public page. Historical sessions priced at the rate in effect when they ran.

4. The trace is the evidence

Every session is signed by your binary and appended to savings.log under ~/.tetris/. Dump or verify it:

tetris savings --since 2026-01-01 --format json
tetris savings verify   # replay traces + re-check signatures

verify re-computes tokens_after from the signed trace and compares. Discrepancies print the session ID. It has never found one.

5. What counts, what doesn't

Sessions that grew tokens are recorded, counted as 0.
Sessions whose pass@1 eval failed (bench runs) count as 0.
Cached prompt tokens priced at cached rates after first call.

We do not inflate by charging full rate for cached tokens or assuming any kept token would have been sent twice.

6. The honest bit

If your session fits in 32k tokens, we save you nothing.

Real savings come from large contexts — long refactors, repo-level reviews, monorepo PRs. Trivial chats round to zero and the TUI shows —.

7. Front-page benchmark

The Pareto chart is the last successful CI run of our internal bench harness on SWE-bench Verified.

tetris bench --suite swe-bench-verified --ratio 8
tetris bench --suite swe-bench-verified --ratio 8 --compressor llmlingua-2

Not a hand-drawn artist's impression. CSV + SVG land in out/.

8. Dispute?

Email savings@tetris.codes with a session ID from tetris savings list. We confirm, find the bug, or refund the Pro month. All three have happened.