Skip to main content

Local AI (optional)

OpenHuman can run a local model on your machine for workloads where keeping data on-device matters: memory embeddings, summary-tree building, background reasoning loops, and explicitly routed chat or reasoning workloads. It is opt-in and ships off by default.

What runs local when you turn it on

WorkloadDefault model
Memory embeddingsall-minilm:latest
Summary-tree buildinggemma3:1b-it-qat
Heartbeat loopSmall chat model
Learning / reflectionSmall chat model
SubconsciousSmall chat model

What stays in the cloud

WorkloadWhy
ChatFrontier reasoning quality unless configured otherwise
ReasoningStronger multi-step quality
VisionRequires more compute
STT / TTSBackend-proxied

How it works

OpenHuman supports two local provider paths:

  • Ollama - for bundled model lifecycle and embeddings
  • LM Studio - through its local OpenAI-compatible server

For Ollama, OpenHuman talks to its OpenAI-compatible /v1 endpoint. If Ollama is not reachable, requests transparently fall back to the remote provider.

Opting in

In the desktop app: Settings → AI & Skills → Local AI

You can choose presets:

  • "Embeddings only"
  • "Memory + reflection"
  • "Everything local"

What you'll need

  • Ollama or LM Studio installed and running locally
  • Enough disk for models (~700 MB for gemma3, ~23 MB for all-minilm)
  • 8 GB+ RAM recommended, 16 GB+ ideal

See also