How Tetris works

Intelligent compression.
Trained models and flows.
Runs 100% on your machine.

A coding-agent hook that compresses your context before every API call. 25–55% cheaper, up to 87% smaller context, no code ever leaves your machine.

Install Tetris See the mechanics

Works with your existing AI subscription. Local hook — no code upload, no middleman. The model sees what your agent sends; we see none of it.

01 — The problem

Context compounds.
Your bill follows.

Every coding-agent call re-ingests everything it has ever read — your full repo context, tool outputs, prior edits — as raw input tokens. On a large codebase that compounds fast with every turn of the conversation.

Developers either eat the cost or manually prune context, which defeats the purpose of using an agent. Neither is acceptable.

Context tokens per call vanilla Claude Code · same task

Call 1

1,200

Call 2

2,480

Call 3

4,160

Call 4

5,920

Call 5

8,000

Each call re-ingests all prior context. Cost compounds with every turn.

02 — The mechanism

In-process.
Before the payload leaves.

Tetris installs as a PreToolUse hook inside Claude Code. Before any context payload leaves your machine, Tetris intercepts it, runs its compression passes, produces a signed receipt, then forwards the leaner payload.

Anthropic receives the same kind of request it always would — just smaller. Tetris is not a proxy, not a cloud relay, not middleware you have to route traffic through. It's in-process.

Execution path · per agent call

1
Agent task arrives
Claude Code receives your prompt. Context payload assembled from repo state.
2
Tetris intercepts
PreToolUse hook fires. Compression passes run locally in < 2 s. Signed receipt produced.
3
Leaner payload forwarded
Compressed context sent to Anthropic. Identical request shape — fewer tokens.
4
Savings logged & signed
SHA-256 receipt written locally. Bytes in, bytes out, strategies applied, elapsed time.

03 — The intelligence

Trained compression models.
Intelligent routing flows.

Tetris uses trained compression models and intelligent routing flows — not generic byte-compression — to understand your code semantically and reduce it with precision. Four strategies run in sequence, each targeting a different kind of waste.

Deduplication

Strips repeated import blocks, shared type declarations, and identical snippets across files without losing semantic meaning. Runs first — clears the easy wins.

Avg savings−18%

AST-aware pruning

Understands your code's structure. Stubs function bodies while preserving signatures and type annotations, so the model knows what exists without reading every line of every file.

Avg savings−34%

Context truncation

Drops least-relevant conversation history while keeping task-critical context intact, using a trained relevance model that scores each prior turn against the current task.

Avg savings−22%

Batch consolidation

Collapses multiple read, glob, and grep calls into a single search query, reducing API round trips and the carry-forward context that accumulates between them.

Avg reduction−4 calls

04 — Intelligent task routing

Not every call deserves a frontier model.

Compression reduces how many tokens your agent sends. Routing reduces what you pay per token. Together, they attack your bill from both directions.

Tetris classifies every agent call before it fires. Exploration tasks — reading files, searching, listing — get routed to a lighter model automatically. Only generation and editing use your frontier model. You never configure this. It just works.

Compression alone

−55%

Routing alone

−38%

Both together

−72%

Measured on a 12-file refactor session · claude-sonnet-4-6 as frontier

Task classification · per call

Lighter model ~45% of calls

File reads & directory listings

Search & grep operations

Context lookups & symbol resolution

Dependency graph traversal

Frontier model ~55% of calls

Code generation & writing

Multi-file edits & refactors

Test writing & bug fixes

Architecture & planning responses

Classification runs locally in < 1 ms. No latency added.

05 — In practice

Same refactor.
Two call graphs.

"Rename handleAuth across src/auth/." — same task, vanilla vs. Tetris.

Without Tetris $0.142 · 34 s

API calls

Input tokens

47,200

Cost

$0.142

read_file("src/auth/index.ts") 2,840 tok

read_file("src/auth/handlers.ts") 1,920 tok

read_file("src/auth/middleware.ts") 1,480 tok

grep("handleAuth", "src/") 3,200 tok

read_file("src/auth/utils.ts") 920 tok

edit_file("src/auth/index.ts") 9,160 tok

edit_file("src/auth/handlers.ts") 10,640 tok

edit_file("src/auth/middleware.ts") 11,320 tok

edit_file("src/auth/utils.ts") 5,720 tok

Each edit call re-ingests all prior tool outputs.

With Tetris $0.025 · 9 s

API calls

Input tokens

8,200

Cost

$0.025

⊗ Tetris · PreToolUse intercepted

Dedupe −1,240 tok · 3 duplicate import blocks removed

AST prune −3,180 tok · 8 fn bodies stubbed, signatures kept

Batch consolidate 5 reads + 1 grep → 1 search call

search("handleAuth", scope="src/auth/") 1,100 tok

batch_edit(4 files · fuzzy match) 1,820 tok

Original47,200 tok

Compressed8,200 tok

Savings82.6%

SHA-2563f2b6f…a9c7e ✓

06 — Privacy & security

Built for teams that can't afford to leak.

1
Your source code never leaves your machine.
Compression runs entirely in-process inside Claude Code. No code is transmitted to Tetris servers at any point.
2
We receive zero telemetry about your code's contents.
The only data we collect: bytes in, bytes out, and elapsed time. Nothing structural. Nothing semantic.
3
Anthropic sees the same compressed request structure it would see anyway.
No new parties enter the data path. Anthropic receives your request — just smaller.
4
Every run produces a SHA-256 signed savings log you can verify independently.
We don't ask you to trust us. We give you a cryptographic receipt for every session.
5
Outbound calls use modern TLS only.
Version checks and license pings go over modern TLS only. Auditable, no hidden surface.

Get started

One command.
Works today.

Free covers your first $100 of measured savings per month. If it saves you nothing, it costs you nothing.

// Install

$ curl -fsSL https://get.tetris.codes | sh

View benchmarks → Worked examples → Docs →

Intelligent compression.Trained models and flows.Runs 100% on your machine.

Context compounds.Your bill follows.

In-process.Before the payload leaves.

Trained compression models.Intelligent routing flows.