Skip to main content

Token Compression

The model has a fixed context window. Token compression is how OpenHuman keeps long conversations, large Memory Trees, and bulky tool results from hitting that ceiling.

What gets compressed

SourceMethod
Web Search resultsSnippet extraction - keeps the top 3 results, drops the rest
Web Scraper outputStrip + truncate at 1 MB input / 50 K output
Memory recall resultsSemantic deduplication before passing chunks to the model
Long tool outputsLine-number truncation with a "see file" hint
Conversation historySummary re-write when turns exceed the window

How it works

Raw input → Filter (ads, nav, boilerplate) → Chunk → Dedupe → Summarize (if over limit) → Model

Configuration

FlagDefaultWhat it does
MAX_SEARCH_RESULTS3Results kept per search
MAX_SCRAPE_BYTES1 MBInput cap per page
MAX_MEMORY_CHUNKS20Chunks recalled per query

See also