Methodology
Documented assumptions you can adjust
2,500 tok/GPU-s · 150W GPU · 55% KV share · 0.417 kg CO₂/kWh. Illustrative — not live metering.
- Tokens saved = original − kept per compression
- Full model in Environment guide
Every oversized prompt burns GPU cycles, grid power, and cooling water on tokens that never needed to run. SuperCompress cuts ~65% of context before inference — keeping what matters, lowering cost, and giving resources back.
~65% fewer tokens · query-aware · CPU-only policy
Paste real context and your question. SuperCompress drops everything it safely can — keeping only what matters for the answer.
Methodology
2,500 tok/GPU-s · 150W GPU · 55% KV share · 0.417 kg CO₂/kWh. Illustrative — not live metering.
Quality
Truncation can save tokens but lose critical lines. SuperCompress targets both KV reduction and oracle recall.
Deployment
~5K-parameter network runs on CPU. Sub-ms latency vs much larger GPU prefill savings on long context.
Try a preset or bring your own context
Runs locally with the trained SuperCompress policy. Your question defines what’s important — the engine evicts low-score lines until only answer-critical context remains.
Token & environmental impact
On long contexts, adaptive mode typically removes 85–95% of tokens while keeping query-critical lines. At a fixed 35% budget, SuperCompress hits 100% oracle recall on benchmark seeds — FIFO and truncation retain ~25%.
curl -X POST https://trysupercompress.vercel.app/api/v1/compress \
-H "X-API-Key: sc_live_…" \
-d '{"context":"…","query":"…"}'
Production API — metered in your dashboard. Requires a sc_live_… key.
Python: pip install git+https://github.com/arjunkshah/supercompress.git
· API reference
Less context. Same answers. Lower grid load. Open source · MIT.
Get API key Read the docs View on GitHub