Voice Mode

MIT License

401

Overview InspectNew Endpoints Schema Related Servers Reviews Score

troubleshooting.md•6.43 kB

# Voicemode Troubleshooting ## Audio Issues ### User being cut off mid-sentence **Problem:** Silence detection stops recording too early. **Solutions:** 1. Increase `listen_duration_min`: ```python converse("What's on your mind?", listen_duration_min=5.0) ``` 2. Decrease VAD aggressiveness: ```python converse("Tell me more", vad_aggressiveness=0) ``` 3. Disable silence detection entirely: ```python converse("Please describe in detail", disable_silence_detection=True) ``` ### Background noise triggering false starts **Problem:** VAD detects non-speech as speech. **Solutions:** 1. Increase VAD aggressiveness: ```python converse("Can you hear me?", vad_aggressiveness=3) ``` 2. Check environment noise levels 3. Use better quality microphone ### Audio chimes getting cut off **Problem:** Bluetooth or audio system delays. **Solutions:** 1. Add leading silence: ```python converse("Hello", chime_leading_silence=1.0) ``` 2. Add trailing silence: ```python converse("Hello", chime_trailing_silence=0.5) ``` 3. Add both: ```python converse("Hello", chime_leading_silence=1.0, chime_trailing_silence=0.5) ``` ### No audio output **Problem:** TTS not playing. **Solutions:** 1. Check if `skip_tts` is enabled 2. Verify VOICEMODE_SKIP_TTS env var 3. Force TTS: ```python converse("Test message", skip_tts=False) ``` 4. Check audio output device settings 5. Verify TTS service endpoint ## Voice Activity Detection (VAD) ### Understanding VAD levels - **0 (Least aggressive):** Captures everything, including background noise - **1 (Low):** Slightly stricter, good for quiet environments - **2 (Balanced - default):** Good for normal home/office - **3 (Most aggressive):** Strict speech detection, filters most noise ### When to adjust VAD | Environment | Recommended Setting | Reason | |-------------|-------------------|---------| | Silent room | 0-1 | Don't miss soft speech | | Home office | 2 | Balanced (default) | | Busy office | 2-3 | Filter typing, conversations | | Cafe/public | 3 | Filter heavy background noise | | Outdoors | 3 | Filter wind, traffic | | Dictation mode | 0-1 + high listen_duration_min | Allow thinking pauses | ## Connection Issues ### STT/TTS endpoint errors **Problem:** Cannot connect to speech services. **Check:** 1. Services expose OpenAI-compatible endpoints: - `/v1/audio/transcriptions` (STT) - `/v1/audio/speech` (TTS) 2. Environment variables are set correctly: - `OPENAI_API_KEY` (if using OpenAI) - Service URLs for Whisper/Kokoro 3. Network connectivity to services 4. Service logs for errors ### Transport issues **Problem:** LiveKit or local transport failing. **Solutions:** 1. Try different transport: ```python # Force local transport converse("Test", transport="local") # Force LiveKit converse("Test", transport="livekit") ``` 2. Check LiveKit room configuration 3. Verify microphone permissions ### Timeout issues **Problem:** Operations timing out. **Solutions:** 1. Increase listen duration: ```python converse("Please elaborate", listen_duration_max=300) ``` 2. Check network latency to services 3. Verify services are responding ## Voice Quality Issues ### Incorrect pronunciation **Problem:** Words mispronounced, especially for non-English. **Solutions:** 1. For non-English, use Kokoro with appropriate voice: ```python converse("Bonjour", voice="ff_siwis", tts_provider="kokoro") ``` 2. See `voicemode-languages` resource for language-specific voices ### Robotic/unnatural voice **Problem:** Voice sounds too mechanical. **Solutions:** 1. Try different voice: ```python converse("Hello", voice="nova") # OpenAI converse("Hello", voice="af_sky", tts_provider="kokoro") ``` 2. Use HD model for better quality: ```python converse("Hello", tts_model="tts-1-hd") ``` 3. Add emotional context: ```python converse( "I'm excited to help!", tts_model="gpt-4o-mini-tts", tts_instructions="Sound warm and friendly" ) ``` ### Speech too fast/slow **Problem:** Default speed doesn't match user preference. **Solution:** Adjust speed: ```python # Slower converse("Complex information", speed=0.8) # Faster converse("Quick update", speed=1.5) ``` ## Recognition Issues ### STT not recognizing speech **Problem:** Speech not being transcribed. **Solutions:** 1. Check microphone is working 2. Verify microphone permissions 3. Increase recording duration: ```python converse("What do you think?", listen_duration_min=5.0) ``` 4. Disable silence detection to see if it's a VAD issue: ```python converse("Testing", disable_silence_detection=True, listen_duration_max=10) ``` ### Incorrect transcriptions **Problem:** Speech transcribed wrong. **Solutions:** 1. Speak more clearly 2. Reduce background noise 3. Adjust VAD for your environment 4. Use better quality microphone 5. Check STT service configuration ## Performance Issues ### Slow response times **Problem:** Long delays between speaking and response. **Causes:** 1. Network latency to STT/TTS services 2. Heavy service load 3. Large audio files **Solutions:** 1. Use lower quality audio format if possible 2. Check service response times 3. Consider local STT/TTS services 4. Use `skip_tts=True` for development: ```python converse("Quick test", skip_tts=True) ``` ## Common Mistakes ### Using coral voice ❌ **Don't:** `voice="coral"` - Not supported ✅ **Do:** Use supported voices (nova, shimmer, af_sky, etc.) ### Not specifying voice for non-English ❌ **Don't:** `converse("Bonjour")` ✅ **Do:** `converse("Bonjour", voice="ff_siwis", tts_provider="kokoro")` ### Setting listen_duration_max too low ❌ **Don't:** `listen_duration_max=5` for complex questions ✅ **Do:** Use default (120) or higher for long responses ### Overriding defaults unnecessarily ❌ **Don't:** Specify `voice`, `tts_provider`, `tts_model` without reason ✅ **Do:** Let system auto-select unless specific need ## Getting More Help If issues persist: 1. Check service logs for errors 2. Verify environment configuration 3. Test with minimal parameters first 4. Add parameters one at a time to isolate issue ## See Also - `voicemode-parameters` - Full parameter reference - `voicemode-patterns` - Best practices - `voicemode-languages` - Language-specific configuration

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mbailey/voicemode'

If you have feedback or need assistance with the MCP directory API, please join our Discord server