The new standard of context compression.

Every oversized prompt burns GPU cycles, grid power, and cooling water on tokens that never needed to run. SuperCompress cuts ~65% of context before inference — keeping what matters, lowering cost, and giving resources back.

SuperCompress launch video preview
SuperCompress launch video preview

~65% fewer tokens · query-aware · CPU-only policy

Every larger model turn pulls more from the grid more power, more water for cooling, more strain on resources already running thin. We send millions of tokens through GPUs that never needed to be processed and watch the cost add up in silence. We cannot scale intelligence by burning through what we have left. SuperCompress cuts wasted tokens before inference keeping what matters, using less compute, and giving resources back to the planet.

Measure the compute you can skip

Paste real context and your question. SuperCompress drops everything it safely can — keeping only what matters for the answer.

Methodology

Documented assumptions you can adjust

2,500 tok/GPU-s · 150W GPU · 55% KV share · 0.417 kg CO₂/kWh. Illustrative — not live metering.

Quality

Savings without dropping answers

Truncation can save tokens but lose critical lines. SuperCompress targets both KV reduction and oracle recall.

  • 100% oracle recall vs ~25% baselines
  • Failure case demo at budget 0.1

Deployment

CPU eviction before GPU inference

~5K-parameter network runs on CPU. Sub-ms latency vs much larger GPU prefill savings on long context.

  • pip install · MIT license
  • Live hosted API + dashboard

Try a preset or bring your own context

Enter in question field · ⌘/Ctrl+Enter in context

Runs locally with the trained SuperCompress policy. Your question defines what’s important — the engine evicts low-score lines until only answer-critical context remains.

Token & environmental impact

  • Input tokens
  • Tokens after SuperCompress
  • Tokens removed
  • Tokens saved
  • Context size saved
  • Answer quality retained
  • Electricity (before) est. prefill
  • Electricity (after) est. prefill
  • Power saved
  • Water saved datacenter cooling (est.)
  • CO₂ saved

Cut waste. Keep the answer.

On long contexts, adaptive mode typically removes 85–95% of tokens while keeping query-critical lines. At a fixed 35% budget, SuperCompress hits 100% oracle recall on benchmark seeds — FIFO and truncation retain ~25%.

Oracle recall at 35% budget: SuperCompress 100%, H2O ~98%, FIFO and truncation ~25%
Baselines SuperCompress Fixed 35% budget · 8 seeds
Adaptive KV savings on long-context presets: typically 85–95%
Query-aware adaptive mode Real preset contexts

Quick start

curl -X POST https://trysupercompress.vercel.app/api/v1/compress \
  -H "X-API-Key: sc_live_…" \
  -d '{"context":"…","query":"…"}'

Production API — metered in your dashboard. Requires a sc_live_… key. Python: pip install git+https://github.com/arjunkshah/supercompress.git · API reference

Build with less compute from day one.

Less context. Same answers. Lower grid load. Open source · MIT.

Get API key Read the docs View on GitHub