Voice Mode

MIT License

401

Overview InspectNew Endpoints Schema Related Servers Reviews Score

voicemode
docs
.archive
api-reference

configuration.md•4.93 kB

# Voice Mode Configuration Reference Voice Mode can be configured through environment variables to customize its behavior. ## Required Configuration ### OpenAI API Key ```bash export OPENAI_API_KEY="your-api-key" ``` This is the only required configuration for basic functionality. ## Speech Services Configuration ### Speech-to-Text (STT) | Variable | Default | Description | |----------|---------|-------------| | `STT_BASE_URL` | OpenAI API | Custom STT endpoint (OpenAI-compatible) | | `STT_MODEL` | `"whisper-1"` | STT model to use | | `STT_LANGUAGE` | auto-detect | Language code (e.g., "en", "es") | | `VOICEMODE_STT_AUDIO_FORMAT` | `"mp3"` | Audio format for STT upload | | `VOICEMODE_WHISPER_MODEL` | `"large-v2"` | Local Whisper model to use (tiny, base, small, medium, large-v2, large-v3) | ### Text-to-Speech (TTS) | Variable | Default | Description | |----------|---------|-------------| | `TTS_BASE_URL` | OpenAI API | Custom TTS endpoint (OpenAI-compatible) | | `TTS_MODEL` | `"tts-1"` | TTS model to use | | `TTS_VOICE` | `"alloy"` | Default voice for TTS | | `TTS_VOICES` | all available | Comma-separated list of allowed voices | | `TTS_MODELS` | all available | Comma-separated list of allowed models | | `VOICEMODE_TTS_AUDIO_FORMAT` | `"pcm"` | Audio format for TTS streaming | ### Audio Format Configuration | Variable | Default | Description | |----------|---------|-------------| | `VOICEMODE_AUDIO_FORMAT` | `"pcm"` | Default audio format for all operations | | `VOICEMODE_TTS_AUDIO_FORMAT` | `"pcm"` | Override for TTS only | | `VOICEMODE_STT_AUDIO_FORMAT` | `"mp3"` | Override for STT upload | | `VOICEMODE_OPUS_BITRATE` | `"32000"` | Opus codec bitrate | | `VOICEMODE_MP3_BITRATE` | `"64k"` | MP3 codec bitrate | Supported formats: `pcm`, `mp3`, `wav`, `flac`, `aac`, `opus` ## LiveKit Configuration For room-based voice communication: | Variable | Default | Description | |----------|---------|-------------| | `LIVEKIT_URL` | none | LiveKit server URL (e.g., "wss://your-app.livekit.cloud") | | `LIVEKIT_API_KEY` | none | LiveKit API key | | `LIVEKIT_API_SECRET` | none | LiveKit API secret | ## Audio Feedback Configuration | Variable | Default | Description | |----------|---------|-------------| | `VOICE_MODE_AUDIO_FEEDBACK` | `"true"` | Enable/disable audio feedback sounds | | `VOICE_MODE_FEEDBACK_STYLE` | `"whisper"` | Feedback style: "whisper" or "shout" | ## Feature Flags | Variable | Default | Description | |----------|---------|-------------| | `VOICE_ALLOW_EMOTIONS` | `"false"` | Enable emotional TTS (requires gpt-4o-mini-tts) | | `VOICEMODE_DEBUG` | `"false"` | Enable debug logging | | `VOICEMODE_SAVE_AUDIO` | `"false"` | Save all audio files | | `VOICEMODE_DISABLE_SILENCE_DETECTION` | `"false"` | Disable automatic silence detection | ## Audio Recording Configuration | Variable | Default | Description | |----------|---------|-------------| | `VOICEMODE_SAMPLE_RATE` | `16000` | Audio sample rate in Hz | | `VOICEMODE_CHANNELS` | `1` | Number of audio channels | | `VOICEMODE_CHUNK_DURATION` | `0.03` | Audio chunk duration in seconds | ## Silence Detection Configuration | Variable | Default | Description | |----------|---------|-------------| | `VOICEMODE_VAD_MODE` | `3` | WebRTC VAD aggressiveness (0-3) | | `VOICEMODE_SILENCE_THRESHOLD` | `0.01` | Audio level threshold for silence | | `VOICEMODE_SILENCE_DURATION` | `0.7` | Seconds of silence before stopping | | `VOICEMODE_MIN_SPEECH_DURATION` | `0.3` | Minimum speech duration in seconds | ## File Locations | Variable | Default | Description | |----------|---------|-------------| | `VOICEMODE_SAVE_DIR` | `~/voicemode_audio/` | Directory for saved audio files | | `VOICEMODE_LOG_DIR` | `~/.voicemode/logs/` | Directory for log files | ## Provider Registry Configuration Multiple provider URLs can be configured: ```bash # Primary providers export TTS_BASE_URL="http://127.0.0.1:8880/v1" export STT_BASE_URL="http://127.0.0.1:2022/v1" # Additional providers (comma-separated) export TTS_BASE_URLS="http://127.0.0.1:8880/v1,https://api.openai.com/v1" export STT_BASE_URLS="http://127.0.0.1:2022/v1,https://api.openai.com/v1" ``` The system will automatically failover to working providers. ## Example Configurations ### Local-Only Setup ```bash export OPENAI_API_KEY="not-needed-for-local" export STT_BASE_URL="http://127.0.0.1:2022/v1" export TTS_BASE_URL="http://127.0.0.1:8880/v1" ``` ### Local Whisper with Different Model ```bash export OPENAI_API_KEY="not-needed-for-local" export STT_BASE_URL="http://127.0.0.1:2022/v1" export VOICEMODE_WHISPER_MODEL="base.en" # Use smaller model for faster processing ``` ### High-Quality Cloud Setup ```bash export OPENAI_API_KEY="your-key" export TTS_MODEL="tts-1-hd" export TTS_VOICE="nova" ``` ### Emotional TTS Setup ```bash export OPENAI_API_KEY="your-key" export VOICE_ALLOW_EMOTIONS="true" export TTS_MODEL="gpt-4o-mini-tts" ```

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mbailey/voicemode'

If you have feedback or need assistance with the MCP directory API, please join our Discord server