Voice Mode

MIT License

401

Overview InspectNew Endpoints Schema Related Servers Reviews Score

parameters.md•4.98 kB

# Voicemode Parameters Reference ## Core Parameters ### message (required) **Type:** string The message to speak to the user. ### wait_for_response **Type:** boolean (default: true) Whether to listen for a voice response after speaking. ## Timing Parameters ### listen_duration_max **Type:** number (default: 120.0 seconds) Maximum time to listen for response. The tool handles silence detection well. **When to override:** - Silence detection is disabled and you need specific timeout - Response will be exceptionally long (>120s) - Special timing requirements **Usually:** Let default and silence detection handle it. ### listen_duration_min **Type:** number (default: 2.0 seconds) Minimum recording time before silence detection can stop. **Use cases:** - Complex questions: 2-3 seconds - Open-ended prompts: 3-5 seconds - Quick responses: 0.5-1 second ### timeout (DEPRECATED) Use `listen_duration_max` instead. Only applies to LiveKit transport. ## Voice & TTS Parameters ### voice **Type:** string (optional) Override TTS voice selection. **When to specify:** - User explicitly requests specific voice - Speaking non-English languages (see languages resource) **Examples:** - OpenAI: nova, shimmer, alloy, echo, fable, onyx - Kokoro: af_sky, af_sarah, am_adam, ef_dora, etc. **Important:** Never use 'coral' voice. ### tts_provider **Type:** "openai" | "kokoro" (optional) TTS provider selection. **When to specify:** - User explicitly requests provider - Failover testing - Non-English languages (usually kokoro) **Usually:** Let system auto-select. ### tts_model **Type:** string (optional) TTS model selection. **Options:** - `tts-1` - Standard quality (OpenAI) - `tts-1-hd` - High definition (OpenAI) - `gpt-4o-mini-tts` - Emotional speech support (OpenAI) **When to specify:** - Need HD quality - Want emotional speech (with tts_instructions) **Usually:** Let system auto-select. ### tts_instructions **Type:** string (optional) Tone/style instructions for emotional speech. **Requirements:** Only works with `tts_model="gpt-4o-mini-tts"` **Examples:** - "Speak in a cheerful tone" - "Sound angry" - "Be extremely sad" - "Sound urgent and concerned" **Note:** Uses OpenAI API, incurs costs (~$0.02/minute) ### speed **Type:** number (0.25 to 4.0, optional) Speech playback rate. **Examples:** - 0.5 = half speed - 1.0 = normal speed (default) - 1.5 = 1.5x speed - 2.0 = double speed **Supported by:** Both OpenAI and Kokoro ## Audio & Silence Detection ### disable_silence_detection **Type:** boolean (default: false) Disable automatic silence detection. **When to use:** - User reports being cut off - Noisy environments - Dictation mode where pauses are expected **Usually:** Leave enabled (false). ### vad_aggressiveness **Type:** integer 0-3 (optional) Voice Activity Detection strictness level. **Levels:** - `0` - Least aggressive, includes more audio, may include non-speech - `1` - Slightly stricter filtering - `2` - Balanced (default) - good for most environments - `3` - Most aggressive, strict detection, may cut off soft speech **When to adjust:** - Quiet room: Use 0-1 to catch all speech - Normal home/office: Use default (2) - Noisy cafe/outdoors: Use 3 ### chime_leading_silence **Type:** number (seconds, optional) Time to add before audio chime starts. **Use case:** Bluetooth devices that need audio buffer (e.g., 1.0 seconds) **Default:** Uses VOICEMODE_CHIME_LEADING_SILENCE env var (0.1s) ### chime_trailing_silence **Type:** number (seconds, optional) Time to add after audio chime ends. **Use case:** Prevent chime cutoff (e.g., 0.5 seconds) **Default:** Uses VOICEMODE_CHIME_TRAILING_SILENCE env var (0.2s) ## Audio Format & Feedback ### audio_format **Type:** string (optional) Override audio format. **Options:** pcm, mp3, wav, flac, aac, opus **Default:** Uses VOICEMODE_TTS_AUDIO_FORMAT env var ### chime_enabled **Type:** boolean | string (optional) Enable or disable audio feedback chimes. **Default:** Uses VOICEMODE_CHIME_ENABLED env var ### skip_tts **Type:** boolean (optional) Skip text-to-speech, show text only. **Values:** - `true` - Skip TTS, faster response, text-only - `false` - Always use TTS - `null` (default) - Follow VOICEMODE_SKIP_TTS env var **Use cases:** - Rapid development iterations - When voice isn't needed - Text-only mode ## Transport Parameters ### transport **Type:** "auto" | "local" | "livekit" (default: "auto") Transport method selection. **Options:** - `auto` - Try LiveKit first, fallback to local - `local` - Direct microphone access - `livekit` - Room-based communication ### room_name **Type:** string (optional) LiveKit room name. **Only for:** livekit transport **Default:** Auto-discovered if empty ## Endpoint Requirements STT/TTS services must expose OpenAI-compatible endpoints: - Whisper/Kokoro must serve on: - `/v1/audio/transcriptions` (STT) - `/v1/audio/speech` (TTS) Connection errors will clearly report attempted endpoints.

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mbailey/voicemode'

If you have feedback or need assistance with the MCP directory API, please join our Discord server