SuperCompress
The new standard of context compression.
AI is something we know and love.
But every answer has a physical cost.
Behind the interface are data centers drawing electricity, cooling water, and grid capacity at an accelerating rate.
The scale is already difficult to ignore.
Sources: IEA Energy and AI; U.S. Department of Energy / Lawrence Berkeley National Laboratory.
Agents process the same context again and again.
- Every turn can resend the entire conversation.
- Documents and tool outputs accumulate as the agent works.
- The GPU still processes lines that have nothing to do with the current question.
Removing tokens is easy. Keeping the right ones is hard.
Every turn pays to process context that may never affect the answer.
The one critical line may sit in the middle of the text that gets removed.
Another model call adds cost and may rewrite details the answer depends on.
SuperCompress.
Learned context compression that keeps what matters before the prompt reaches the language model.
A smarter memory layer for AI agents.
SuperCompress sits between your application and your language model. It reduces the prompt without replacing the model or changing the workflow.
- Query-aware. It keeps context for the question being asked now.
- Model-agnostic. It works with hosted and open-weight models.
- CPU-first. It does not spend another LLM call to save tokens.
Less context goes in. The important context stays.
Score every line. Preserve the signal.
Context and question enter together.
The current query defines what information is relevant.
Each line receives a relevance score.
The policy uses nine features including recency, position, and query overlap.
The strongest lines stay in order.
The focused context is sent to the language model for inference.