Skip to main content
Glama

Voice Mode

by mbailey
configuration.md•7.89 kB
# VoiceMode Configuration Guide VoiceMode provides flexible configuration through environment variables and configuration files, following standard precedence rules while maintaining sensible defaults. *Note: The python package is called `voice-mode` but ## Quick Start VoiceMode works out of the box with minimal configuration: ### Cloud Setup (Easiest) ```bash # Just need an OpenAI API key export OPENAI_API_KEY="your-api-key" ``` ### Local Setup ```bash # Run local services (Whisper + Kokoro) voicemode kokoro start voicemode whisper start # VoiceMode auto-detects them! ``` ### Hybrid Setup (Recommended) ```bash # Use local services with cloud fallback export OPENAI_API_KEY="your-api-key" # Fallback # Local services auto-detected when running ``` ## Configuration System ### Configuration Precedence VoiceMode follows standard configuration precedence (highest to lowest): 1. **Environment variables** - Always win 2. **Project config** - `./voicemode.env` in current directory 3. **User config** - `~/.voicemode/voicemode.env` 4. **Auto-discovered services** - Running local services 5. **Built-in defaults** - Sensible fallbacks ### Configuration Files VoiceMode automatically creates `~/.voicemode/voicemode.env` on first run with basic settings. This file uses shell export format: ```bash # ~/.voicemode/voicemode.env example export OPENAI_API_KEY="sk-..." export VOICEMODE_VOICES="af_sky,nova" export VOICEMODE_DEBUG=false ``` ### MCP Configuration When used as an MCP server, add to your Claude or other MCP client configuration: ```json { "mcpServers": { "voicemode": { "command": "uvx", "args": ["voice-mode"], "env": { "OPENAI_API_KEY": "your-key-here", "VOICEMODE_VOICES": "nova,shimmer" } } } } ``` ## Configuration Reference ### API Keys and Authentication ```bash # OpenAI API Key (for cloud TTS/STT) OPENAI_API_KEY=sk-... # LiveKit credentials (for room-based voice) LIVEKIT_API_KEY=devkey # Default for local dev LIVEKIT_API_SECRET=secret # Default for local dev ``` ### Voice Services #### Text-to-Speech (TTS) ```bash # TTS Service URLs (comma-separated, tried in order) VOICEMODE_TTS_BASE_URLS=http://127.0.0.1:8880/v1,https://api.openai.com/v1 # Voice preferences (comma-separated) # OpenAI: alloy, echo, fable, onyx, nova, shimmer # Kokoro: af_sky, af_sarah, am_adam, bf_emma, etc. VOICEMODE_VOICES=af_sky,nova,alloy # TTS Models (comma-separated) # OpenAI: tts-1, tts-1-hd, gpt-4o-mini-tts VOICEMODE_TTS_MODELS=tts-1-hd,tts-1 # Default TTS voice and model VOICEMODE_TTS_VOICE=nova VOICEMODE_TTS_MODEL=tts-1-hd # Speech speed (0.25 to 4.0) VOICEMODE_TTS_SPEED=1.0 ``` #### Speech-to-Text (STT) ```bash # STT Service URLs VOICEMODE_STT_BASE_URLS=http://127.0.0.1:2022/v1,https://api.openai.com/v1 # Whisper configuration VOICEMODE_WHISPER_MODEL=large-v2 # Model size VOICEMODE_WHISPER_LANGUAGE=auto # Language detection VOICEMODE_WHISPER_PORT=2022 # Server port ``` ### Audio Configuration ```bash # Audio formats VOICEMODE_AUDIO_FORMAT=pcm # Global default VOICEMODE_TTS_AUDIO_FORMAT=pcm # TTS-specific VOICEMODE_STT_AUDIO_FORMAT=mp3 # STT-specific # Supported formats: pcm, opus, mp3, wav, flac, aac # Quality settings VOICEMODE_OPUS_BITRATE=32000 # Opus bitrate (bps) VOICEMODE_MP3_BITRATE=64k # MP3 bitrate VOICEMODE_AAC_BITRATE=64k # AAC bitrate VOICEMODE_SAMPLE_RATE=24000 # Sample rate (Hz) ``` ### Audio Feedback ```bash # Chimes when recording starts/stops VOICEMODE_AUDIO_FEEDBACK=true VOICEMODE_FEEDBACK_STYLE=whisper # or "shout" # Silence around chimes (for Bluetooth) VOICEMODE_CHIME_PRE_DELAY=1.0 # Seconds before VOICEMODE_CHIME_POST_DELAY=0.5 # Seconds after ``` ### Voice Activity Detection ```bash # VAD Aggressiveness (0-3) # 0: Least aggressive (captures more) # 3: Most aggressive (filters more) VOICEMODE_VAD_AGGRESSIVENESS=2 # Silence detection VOICEMODE_SILENCE_THRESHOLD=3.0 # Seconds of silence VOICEMODE_MIN_RECORDING_TIME=0.5 # Minimum recording VOICEMODE_MAX_RECORDING_TIME=120.0 # Maximum recording ``` ### LiveKit Configuration ```bash # Server settings LIVEKIT_URL=ws://127.0.0.1:7880 LIVEKIT_PORT=7880 # Room settings VOICEMODE_LIVEKIT_ROOM_PREFIX=voicemode VOICEMODE_LIVEKIT_AUTO_CREATE=true ``` ### Local Service Paths ```bash # Kokoro TTS VOICEMODE_KOKORO_PORT=8880 VOICEMODE_KOKORO_MODELS_DIR=~/Models/kokoro VOICEMODE_KOKORO_CACHE_DIR=~/.voicemode/cache/kokoro # Service directories VOICEMODE_DATA_DIR=~/.voicemode VOICEMODE_LOG_DIR=~/.voicemode/logs VOICEMODE_CACHE_DIR=~/.voicemode/cache ``` ### Debugging and Logging ```bash # Debug mode (verbose logging, saves all files) VOICEMODE_DEBUG=true # Logging levels VOICEMODE_LOG_LEVEL=info # debug, info, warning, error VOICEMODE_SAVE_ALL=false # Save all audio files VOICEMODE_SAVE_RECORDINGS=false # Save input recordings VOICEMODE_SAVE_TTS=false # Save TTS output # Event logging VOICEMODE_EVENT_LOG=false # Log all events VOICEMODE_CONVERSATION_LOG=false # Log conversations ``` ### Development Settings ```bash # Skip TTS for faster development VOICEMODE_SKIP_TTS=false # Disable specific features VOICEMODE_DISABLE_SILENCE_DETECTION=false VOICEMODE_DISABLE_VAD=false ``` ## Voice Preferences System VoiceMode supports project-specific voice preferences. Create a `.voicemode.env` file in your project root: ```bash # Project-specific voices for a game export VOICEMODE_VOICES="onyx,fable" export VOICEMODE_TTS_SPEED=0.9 ``` This allows different projects to have different voice settings without changing global configuration. ## Service Auto-Discovery VoiceMode automatically discovers running local services: 1. **Whisper STT**: Checks `http://127.0.0.1:2022/v1` 2. **Kokoro TTS**: Checks `http://127.0.0.1:8880/v1` 3. **LiveKit**: Checks `ws://127.0.0.1:7880` No configuration needed when services run on default ports! ## Configuration Philosophy VoiceMode balances MCP compliance with user convenience: - **Host configuration is authoritative** - Environment variables always win - **Reasonable defaults** - Works without any configuration - **Progressive disclosure** - Simple configs for basic use, advanced options available - **File-based convenience** - Edit familiar config files instead of multiple host configs ## Common Configurations ### Privacy-Focused Local Setup ```bash # No cloud services, everything local export VOICEMODE_TTS_BASE_URLS=http://127.0.0.1:8880/v1 export VOICEMODE_STT_BASE_URLS=http://127.0.0.1:2022/v1 export VOICEMODE_VOICES=af_sky ``` ### High-Quality Cloud Setup ```bash # Best quality with OpenAI export OPENAI_API_KEY=sk-... export VOICEMODE_TTS_MODEL=tts-1-hd export VOICEMODE_VOICES=nova,alloy ``` ## Troubleshooting Configuration ### Check Active Configuration ```bash # List all configuration keys voicemode config list # Get specific settings voicemode config get VOICEMODE_TTS_VOICE voicemode config get OPENAI_API_KEY ``` ### Configuration Not Working? 1. **Check precedence**: Environment variables override files 2. **Verify syntax**: Use `export VAR=value` format in files 3. **Check permissions**: Ensure config files are readable 4. **Test services**: Verify local services are running 5. **Enable debug**: Set `VOICEMODE_DEBUG=true` for details ### Reset Configuration ```bash # Backup and recreate default config mv ~/.voicemode/voicemode.env ~/.voicemode/voicemode.env.backup # Edit the configuration file to reset voicemode config edit ``` ## Security Considerations - **Never commit API keys** to version control - **Use environment variables** for sensitive data in production - **Restrict file permissions**: `chmod 600 ~/.voicemode/voicemode.env` - **Rotate keys regularly** if exposed - **Use local services** for sensitive audio data

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mbailey/voicemode'

If you have feedback or need assistance with the MCP directory API, please join our Discord server