How Tetris works

Intelligent compression.
Trained models and flows.
Runs 100% on your machine.

A coding-agent hook that compresses your context before every API call. 25–55% cheaper, up to 87% smaller context, no code ever leaves your machine.

Works with your existing AI subscription. Local hook — no code upload, no middleman. The model sees what your agent sends; we see none of it.

Context compounds.
Your bill follows.

Every coding-agent call re-ingests everything it has ever read — your full repo context, tool outputs, prior edits — as raw input tokens. On a large codebase that compounds fast with every turn of the conversation.

Developers either eat the cost or manually prune context, which defeats the purpose of using an agent. Neither is acceptable.

Context tokens per call vanilla Claude Code · same task
Call 1
1,200
Call 2
2,480
Call 3
4,160
Call 4
5,920
Call 5
8,000

Each call re-ingests all prior context. Cost compounds with every turn.

In-process.
Before the payload leaves.

Tetris installs as a PreToolUse hook inside Claude Code. Before any context payload leaves your machine, Tetris intercepts it, runs its compression passes, produces a signed receipt, then forwards the leaner payload.

Anthropic receives the same kind of request it always would — just smaller. Tetris is not a proxy, not a cloud relay, not middleware you have to route traffic through. It's in-process.

Execution path · per agent call
  1. 1
    Agent task arrives

    Claude Code receives your prompt. Context payload assembled from repo state.

  2. 2
    Tetris intercepts

    PreToolUse hook fires. Compression passes run locally in < 2 s. Signed receipt produced.

  3. 3
    Leaner payload forwarded

    Compressed context sent to Anthropic. Identical request shape — fewer tokens.

  4. 4
    Savings logged & signed

    SHA-256 receipt written locally. Bytes in, bytes out, strategies applied, elapsed time.

Trained compression models.
Intelligent routing flows.

Tetris uses trained compression models and intelligent routing flows — not generic byte-compression — to understand your code semantically and reduce it with precision. Four strategies run in sequence, each targeting a different kind of waste.

01

Deduplication

Strips repeated import blocks, shared type declarations, and identical snippets across files without losing semantic meaning. Runs first — clears the easy wins.

Avg savings−18%
02

AST-aware pruning

Understands your code's structure. Stubs function bodies while preserving signatures and type annotations, so the model knows what exists without reading every line of every file.

Avg savings−34%
03

Context truncation

Drops least-relevant conversation history while keeping task-critical context intact, using a trained relevance model that scores each prior turn against the current task.

Avg savings−22%
04

Batch consolidation

Collapses multiple read, glob, and grep calls into a single search query, reducing API round trips and the carry-forward context that accumulates between them.

Avg reduction−4 calls

Not every call deserves a frontier model.

Compression reduces how many tokens your agent sends. Routing reduces what you pay per token. Together, they attack your bill from both directions.

Tetris classifies every agent call before it fires. Exploration tasks — reading files, searching, listing — get routed to a lighter model automatically. Only generation and editing use your frontier model. You never configure this. It just works.

Compression alone
−55%
Routing alone
−38%
Both together
−72%

Measured on a 12-file refactor session · claude-sonnet-4-6 as frontier

Task classification · per call
Lighter model ~45% of calls
File reads & directory listings
Search & grep operations
Context lookups & symbol resolution
Dependency graph traversal
Frontier model ~55% of calls
Code generation & writing
Multi-file edits & refactors
Test writing & bug fixes
Architecture & planning responses
Classification runs locally in < 1 ms. No latency added.

Same refactor.
Two call graphs.

"Rename handleAuth across src/auth/." — same task, vanilla vs. Tetris.

Without Tetris $0.142 · 34 s
API calls
9
Input tokens
47,200
Cost
$0.142
read_file("src/auth/index.ts") 2,840 tok
read_file("src/auth/handlers.ts") 1,920 tok
read_file("src/auth/middleware.ts") 1,480 tok
grep("handleAuth", "src/") 3,200 tok
read_file("src/auth/utils.ts") 920 tok
edit_file("src/auth/index.ts") 9,160 tok
edit_file("src/auth/handlers.ts") 10,640 tok
edit_file("src/auth/middleware.ts") 11,320 tok
edit_file("src/auth/utils.ts") 5,720 tok
Each edit call re-ingests all prior tool outputs.
With Tetris $0.025 · 9 s
API calls
2
Input tokens
8,200
Cost
$0.025
⊗ Tetris · PreToolUse intercepted
Dedupe −1,240 tok · 3 duplicate import blocks removed
AST prune −3,180 tok · 8 fn bodies stubbed, signatures kept
Batch consolidate 5 reads + 1 grep → 1 search call
search("handleAuth", scope="src/auth/") 1,100 tok
batch_edit(4 files · fuzzy match) 1,820 tok
Original47,200 tok
Compressed8,200 tok
Savings82.6%
SHA-2563f2b6f…a9c7e ✓

Built for teams that can't afford to leak.

  1. 1
    Your source code never leaves your machine.

    Compression runs entirely in-process inside Claude Code. No code is transmitted to Tetris servers at any point.

  2. 2
    We receive zero telemetry about your code's contents.

    The only data we collect: bytes in, bytes out, and elapsed time. Nothing structural. Nothing semantic.

  3. 3
    Anthropic sees the same compressed request structure it would see anyway.

    No new parties enter the data path. Anthropic receives your request — just smaller.

  4. 4
    Every run produces a SHA-256 signed savings log you can verify independently.

    We don't ask you to trust us. We give you a cryptographic receipt for every session.

  5. 5
    Outbound calls use modern TLS only.

    Version checks and license pings go over modern TLS only. Auditable, no hidden surface.

One command.
Works today.

Free covers your first $100 of measured savings per month. If it saves you nothing, it costs you nothing.

// Install
$ curl -fsSL https://get.tetris.codes | sh