Skip to main content
Glama
vad-debugging.md4.6 kB
# VAD (Voice Activity Detection) Debugging Guide This guide explains how to debug Voice Activity Detection issues in Voice Mode. ## Enabling VAD Debug Mode To enable detailed VAD debugging output, set the environment variable: ```bash export VOICEMODE_VAD_DEBUG=true ``` This will output detailed information to stderr including: - VAD configuration at startup - Real-time speech detection decisions - State transitions (WAITING_FOR_SPEECH → SPEECH_ACTIVE) - Silence accumulation tracking - Final recording state ## Debug Output Format When VAD_DEBUG is enabled, you'll see output like: ``` [VAD_DEBUG] Starting VAD recording with config: [VAD_DEBUG] max_duration: 120.0s [VAD_DEBUG] min_duration: 2.0s [VAD_DEBUG] effective_min_duration: 2.0s [VAD_DEBUG] VAD aggressiveness: 2 [VAD_DEBUG] Silence threshold: 800ms [VAD_DEBUG] Sample rate: 24000Hz (VAD using 16000Hz) [VAD_DEBUG] Chunk duration: 30ms [VAD_DEBUG] t=0.5s: speech=False, RMS=125, state=WAITING [VAD_DEBUG] t=1.0s: speech=False, RMS=132, state=WAITING [VAD_DEBUG] t=1.5s: speech=True, RMS=1856, state=WAITING [VAD_DEBUG] STATE CHANGE: WAITING_FOR_SPEECH -> SPEECH_ACTIVE at t=1.5s [VAD_DEBUG] t=2.0s: speech=True, RMS=2134, state=ACTIVE [VAD_DEBUG] t=2.5s: speech=False, RMS=145, state=ACTIVE [VAD_DEBUG] Accumulating silence: 100/800ms, t=2.6s [VAD_DEBUG] Accumulating silence: 200/800ms, t=2.7s ... [VAD_DEBUG] Accumulating silence: 800/800ms, t=3.4s [VAD_DEBUG] STOP: silence_duration=800ms >= threshold=800ms [VAD_DEBUG] STOP: recording_duration=3.4s >= min_duration=2.0s [VAD_DEBUG] FINAL STATE: Speech was detected, recording complete ``` ## Common Issues and Solutions ### Issue: Recording Stops Before Speech **Symptom**: Recording ends with "No speech detected" even though you haven't spoken yet. **Debug with**: ```bash export VOICEMODE_VAD_DEBUG=true python scripts/test-vad-enhancement.py ``` **Look for**: Check if the VAD is incorrectly detecting noise as speech early in the recording. **Solutions**: 1. Increase VAD aggressiveness: `export VOICEMODE_VAD_AGGRESSIVENESS=3` 2. Ensure you're in a quiet environment 3. Check microphone sensitivity ### Issue: Recording Doesn't Stop After Speech **Symptom**: Recording continues for the full duration even after you stop speaking. **Debug output to check**: - Are silence periods being detected? Look for "Accumulating silence" messages - Is the silence threshold being reached? **Solutions**: 1. Decrease VAD aggressiveness: `export VOICEMODE_VAD_AGGRESSIVENESS=1` 2. Reduce silence threshold: `export VOICEMODE_SILENCE_THRESHOLD_MS=600` ### Issue: Recording Cuts Off Mid-Speech **Symptom**: Recording stops while you're still speaking. **Debug output to check**: - Look for rapid state changes between speech and silence - Check if min_duration is being respected **Solutions**: 1. Increase min_listen_duration in the converse call 2. Increase silence threshold: `export VOICEMODE_SILENCE_THRESHOLD_MS=1200` ## VAD Configuration Parameters | Parameter | Environment Variable | Default | Description | |-----------|---------------------|---------|-------------| | VAD Aggressiveness | `VOICEMODE_VAD_AGGRESSIVENESS` | 2 | 0-3, higher = more aggressive filtering | | Silence Threshold | `VOICEMODE_SILENCE_THRESHOLD_MS` | 800ms | How long to wait after speech stops | | Min Recording Duration | `VOICEMODE_MIN_RECORDING_DURATION` | 0.5s | Global minimum recording time | | min_listen_duration | Function parameter | 2.0s | Per-call minimum recording time | ## Testing VAD Enhancement Use the included test script to verify VAD behavior: ```bash # Test waiting for speech python scripts/test-vad-enhancement.py # With debug output export VOICEMODE_VAD_DEBUG=true python scripts/test-vad-enhancement.py ``` This script will: 1. Wait indefinitely for you to speak 2. Start recording when speech is detected 3. Stop after silence threshold is reached 4. Report whether speech was detected ## Implementation Details The VAD operates as a state machine with three states: 1. **WAITING_FOR_SPEECH**: - Initial state - No timeout - waits indefinitely - Transitions to SPEECH_ACTIVE when speech detected 2. **SPEECH_ACTIVE**: - Active recording state - Resets silence counter when speech detected - Transitions to SILENCE_AFTER_SPEECH when silence detected 3. **SILENCE_AFTER_SPEECH**: - Accumulates silence duration - Stops recording when: - Silence duration >= threshold AND - Total recording duration >= min_duration This ensures recordings don't cut off before speech begins or end prematurely.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mbailey/voicemode'

If you have feedback or need assistance with the MCP directory API, please join our Discord server