Local AI (optional)
OpenHuman can run a local model on your machine for workloads where keeping data on-device matters: memory embeddings, summary-tree building, background reasoning loops, and explicitly routed chat or reasoning workloads. It is opt-in and ships off by default.
What runs local when you turn it on
| Workload | Default model |
|---|---|
| Memory embeddings | all-minilm:latest |
| Summary-tree building | gemma3:1b-it-qat |
| Heartbeat loop | Small chat model |
| Learning / reflection | Small chat model |
| Subconscious | Small chat model |
What stays in the cloud
| Workload | Why |
|---|---|
| Chat | Frontier reasoning quality unless configured otherwise |
| Reasoning | Stronger multi-step quality |
| Vision | Requires more compute |
| STT / TTS | Backend-proxied |
How it works
OpenHuman supports two local provider paths:
- Ollama - for bundled model lifecycle and embeddings
- LM Studio - through its local OpenAI-compatible server
For Ollama, OpenHuman talks to its OpenAI-compatible /v1 endpoint. If Ollama is not reachable, requests transparently fall back to the remote provider.
Opting in
In the desktop app: Settings → AI & Skills → Local AI
You can choose presets:
- "Embeddings only"
- "Memory + reflection"
- "Everything local"
What you'll need
- Ollama or LM Studio installed and running locally
- Enough disk for models (~700 MB for gemma3, ~23 MB for all-minilm)
- 8 GB+ RAM recommended, 16 GB+ ideal
See also
- Memory Tree - What local embeddings power
- Model Routing - How workloads are routed
- Privacy & Security - What moves on-device