Token Compression
The model has a fixed context window. Token compression is how OpenHuman keeps long conversations, large Memory Trees, and bulky tool results from hitting that ceiling.
What gets compressed
| Source | Method |
|---|---|
| Web Search results | Snippet extraction - keeps the top 3 results, drops the rest |
| Web Scraper output | Strip + truncate at 1 MB input / 50 K output |
| Memory recall results | Semantic deduplication before passing chunks to the model |
| Long tool outputs | Line-number truncation with a "see file" hint |
| Conversation history | Summary re-write when turns exceed the window |
How it works
Raw input → Filter (ads, nav, boilerplate) → Chunk → Dedupe → Summarize (if over limit) → Model
Configuration
| Flag | Default | What it does |
|---|---|---|
MAX_SEARCH_RESULTS | 3 | Results kept per search |
MAX_SCRAPE_BYTES | 1 MB | Input cap per page |
MAX_MEMORY_CHUNKS | 20 | Chunks recalled per query |
See also
- Web Search - Search results before compression
- Web Scraper - Page content before compression
- Memory Tools - Recall results before compression