Voice Mode

voicemode
docs
.archive

audio-format-migration.md•5.64 KiB

# Audio Format Migration Guide ## Overview Voice Mode now uses **PCM** audio format by default for TTS streaming. This change provides: - **Zero encoding latency** - No compression overhead for real-time streaming - **Best streaming performance** - Direct audio data without conversion - **Maximum compatibility** - Works with all audio systems - **Instant playback** - No decoding required For STT uploads and audio saving, compressed formats like Opus are still available. **Important Note**: While Opus was originally intended for streaming due to its low-latency design, in practice it requires full buffering before playback. PCM is the only format that truly supports progressive streaming for TTS. ## Quick Start For most users, no action is required. Voice Mode will automatically use PCM format for TTS streaming, providing the best real-time performance. ### To Use Compressed Formats If you prefer compressed formats (trading latency for smaller file sizes): ```bash export VOICEMODE_TTS_AUDIO_FORMAT="opus" # or mp3, aac, etc. ``` Or add to your MCP configuration: ```json { "mcpServers": { "voice-mode": { "command": "uvx", "args": ["voice-mode"], "env": { "OPENAI_API_KEY": "your-key", "VOICEMODE_TTS_AUDIO_FORMAT": "opus" } } } } ``` ## Configuration Options ### Basic Configuration ```bash # Set default format for all operations export VOICEMODE_AUDIO_FORMAT="pcm" # Options: pcm, opus, mp3, wav, flac, aac # PCM is default for TTS streaming (best performance) export VOICEMODE_TTS_AUDIO_FORMAT="pcm" ``` ### Advanced Configuration ```bash # Different formats for TTS and STT export VOICEMODE_TTS_AUDIO_FORMAT="pcm" # For text-to-speech (default) export VOICEMODE_STT_AUDIO_FORMAT="opus" # For speech-to-text upload # Quality settings (for compressed formats) export VOICEMODE_OPUS_BITRATE="32000" # Opus bitrate (default: 32kbps) export VOICEMODE_MP3_BITRATE="64k" # MP3 bitrate (default: 64k) export VOICEMODE_AAC_BITRATE="64k" # AAC bitrate (default: 64k) ``` ## Provider Compatibility Voice Mode automatically validates format compatibility with your providers: | Provider | TTS Formats | STT Formats | |----------|-------------|-------------| | OpenAI | opus, mp3, aac, flac, wav, pcm | mp3, opus, wav, flac, m4a, webm | | Kokoro (local) | mp3, wav | N/A | | Whisper.cpp (local) | N/A | wav, mp3, opus, flac, m4a | If you select an unsupported format, Voice Mode will automatically fallback to a compatible format. ## Migration from Existing Setup ### Checking Your Current Setup If you have existing audio files saved with `VOICEMODE_SAVE_AUDIO=true`, they are likely in MP3 or Opus format. You can check: ```bash ls ~/voicemode_audio/ ``` ### Gradual Migration You can run multiple formats side-by-side: 1. Keep existing compressed audio files 2. TTS streaming uses PCM for best performance 3. STT uploads can use compressed formats 4. All formats work seamlessly together ### Converting Existing Files To convert existing MP3 files to Opus (optional): ```bash # Using ffmpeg for file in ~/voicemode_audio/*.mp3; do ffmpeg -i "$file" -c:a libopus -b:a 32k "${file%.mp3}.opus" done ``` ## Troubleshooting ### Issue: "Provider doesn't support format" Voice Mode will automatically fallback to a supported format. You'll see a log message like: ``` Format 'opus' not supported by kokoro, using 'mp3' instead ``` Note: PCM is universally supported for streaming. ### Issue: "Audio playback issues" Some older systems might have issues with Opus playback. Try: 1. Update your audio libraries: ```bash # Ubuntu/Debian sudo apt update && sudo apt install libopus0 libopusfile0 # macOS brew install opus opus-tools ``` 2. Or switch to a compressed format: ```bash export VOICEMODE_TTS_AUDIO_FORMAT="mp3" ``` ### Issue: "Larger file sizes than expected" Opus files might appear larger if saved in an OGG container. The actual audio data is still compressed efficiently. ## Format Comparison | Format | File Size* | Quality | Latency | Best For | |--------|-----------|---------|---------|----------| | PCM | N/A (streaming) | Uncompressed | Zero | TTS streaming (default) | | Opus | Smallest (100KB) | Excellent for voice | High (buffering required) | STT uploads, saving | | MP3 | Medium (500KB) | Good | Low | Wide compatibility | | AAC | Medium (450KB) | Good | Low | Apple ecosystem | | FLAC | Large (2MB) | Lossless | Low | Archival | | WAV | Largest (5MB) | Uncompressed | Zero | Local processing | *Approximate sizes for 1 minute of speech ## Benefits of PCM for Streaming 1. **Zero Latency**: No encoding/decoding overhead 2. **Best Performance**: Direct audio playback 3. **Universal Support**: Works on all systems 4. **Streaming Optimized**: No buffering for format conversion 5. **Real-time Ready**: Perfect for live conversations ## Benefits of Opus for Uploads 1. **Bandwidth Efficiency**: Crucial for cloud API calls 2. **Small File Size**: 50-80% smaller than MP3 3. **Voice Optimized**: Designed for speech 4. **Wide Platform Support**: Works on modern systems 5. **Future-proof**: Active development ## Changing Default Formats To change from PCM streaming to compressed formats: 1. Set environment variables: ```bash # For TTS streaming (consider latency impact) export VOICEMODE_TTS_AUDIO_FORMAT="opus" # For STT uploads (already uses compression by default) export VOICEMODE_STT_AUDIO_FORMAT="mp3" ``` 2. Or update your MCP configuration as shown above 3. Restart your MCP client PCM provides the best streaming performance, but compressed formats are useful for: - Reducing bandwidth usage - Saving audio files - STT uploads to cloud services

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mbailey/voicemode'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

audio-format-migration.md•5.64 KiB