跳到主要内容

Voice (STT & TTS)

OpenHuman has a voice layer so the agent can read text aloud and you can speak instead of type.

Speech-to-Text (STT)

  • Captures from your microphone on demand
  • Streams to the backend for transcription
  • Supports multiple languages

Text-to-Speech (TTS)

  • Streams generated audio directly to your speakers
  • Not stored - generated and discarded
  • Supports multiple voices

Voice settings

From Settings → Voice:

  • Microphone - select input device
  • Voice model - choose a voice profile
  • Language - STT language preference
  • Wake word - optional "Hey OpenHuman" activation (default off)

Privacy note

Audio buffers are processed locally and not written to disk. See Privacy & Security.

See also