Skip to main content
Glama
audio-format-migration.md5.77 kB
# Audio Format Migration Guide ## Overview Voice Mode now uses **PCM** audio format by default for TTS streaming. This change provides: - **Zero encoding latency** - No compression overhead for real-time streaming - **Best streaming performance** - Direct audio data without conversion - **Maximum compatibility** - Works with all audio systems - **Instant playback** - No decoding required For STT uploads and audio saving, compressed formats like Opus are still available. **Important Note**: While Opus was originally intended for streaming due to its low-latency design, in practice it requires full buffering before playback. PCM is the only format that truly supports progressive streaming for TTS. ## Quick Start For most users, no action is required. Voice Mode will automatically use PCM format for TTS streaming, providing the best real-time performance. ### To Use Compressed Formats If you prefer compressed formats (trading latency for smaller file sizes): ```bash export VOICEMODE_TTS_AUDIO_FORMAT="opus" # or mp3, aac, etc. ``` Or add to your MCP configuration: ```json { "mcpServers": { "voice-mode": { "command": "uvx", "args": ["voice-mode"], "env": { "OPENAI_API_KEY": "your-key", "VOICEMODE_TTS_AUDIO_FORMAT": "opus" } } } } ``` ## Configuration Options ### Basic Configuration ```bash # Set default format for all operations export VOICEMODE_AUDIO_FORMAT="pcm" # Options: pcm, opus, mp3, wav, flac, aac # PCM is default for TTS streaming (best performance) export VOICEMODE_TTS_AUDIO_FORMAT="pcm" ``` ### Advanced Configuration ```bash # Different formats for TTS and STT export VOICEMODE_TTS_AUDIO_FORMAT="pcm" # For text-to-speech (default) export VOICEMODE_STT_AUDIO_FORMAT="opus" # For speech-to-text upload # Quality settings (for compressed formats) export VOICEMODE_OPUS_BITRATE="32000" # Opus bitrate (default: 32kbps) export VOICEMODE_MP3_BITRATE="64k" # MP3 bitrate (default: 64k) export VOICEMODE_AAC_BITRATE="64k" # AAC bitrate (default: 64k) ``` ## Provider Compatibility Voice Mode automatically validates format compatibility with your providers: | Provider | TTS Formats | STT Formats | |----------|-------------|-------------| | OpenAI | opus, mp3, aac, flac, wav, pcm | mp3, opus, wav, flac, m4a, webm | | Kokoro (local) | mp3, wav | N/A | | Whisper.cpp (local) | N/A | wav, mp3, opus, flac, m4a | If you select an unsupported format, Voice Mode will automatically fallback to a compatible format. ## Migration from Existing Setup ### Checking Your Current Setup If you have existing audio files saved with `VOICEMODE_SAVE_AUDIO=true`, they are likely in MP3 or Opus format. You can check: ```bash ls ~/voicemode_audio/ ``` ### Gradual Migration You can run multiple formats side-by-side: 1. Keep existing compressed audio files 2. TTS streaming uses PCM for best performance 3. STT uploads can use compressed formats 4. All formats work seamlessly together ### Converting Existing Files To convert existing MP3 files to Opus (optional): ```bash # Using ffmpeg for file in ~/voicemode_audio/*.mp3; do ffmpeg -i "$file" -c:a libopus -b:a 32k "${file%.mp3}.opus" done ``` ## Troubleshooting ### Issue: "Provider doesn't support format" Voice Mode will automatically fallback to a supported format. You'll see a log message like: ``` Format 'opus' not supported by kokoro, using 'mp3' instead ``` Note: PCM is universally supported for streaming. ### Issue: "Audio playback issues" Some older systems might have issues with Opus playback. Try: 1. Update your audio libraries: ```bash # Ubuntu/Debian sudo apt update && sudo apt install libopus0 libopusfile0 # macOS brew install opus opus-tools ``` 2. Or switch to a compressed format: ```bash export VOICEMODE_TTS_AUDIO_FORMAT="mp3" ``` ### Issue: "Larger file sizes than expected" Opus files might appear larger if saved in an OGG container. The actual audio data is still compressed efficiently. ## Format Comparison | Format | File Size* | Quality | Latency | Best For | |--------|-----------|---------|---------|----------| | PCM | N/A (streaming) | Uncompressed | Zero | TTS streaming (default) | | Opus | Smallest (100KB) | Excellent for voice | High (buffering required) | STT uploads, saving | | MP3 | Medium (500KB) | Good | Low | Wide compatibility | | AAC | Medium (450KB) | Good | Low | Apple ecosystem | | FLAC | Large (2MB) | Lossless | Low | Archival | | WAV | Largest (5MB) | Uncompressed | Zero | Local processing | *Approximate sizes for 1 minute of speech ## Benefits of PCM for Streaming 1. **Zero Latency**: No encoding/decoding overhead 2. **Best Performance**: Direct audio playback 3. **Universal Support**: Works on all systems 4. **Streaming Optimized**: No buffering for format conversion 5. **Real-time Ready**: Perfect for live conversations ## Benefits of Opus for Uploads 1. **Bandwidth Efficiency**: Crucial for cloud API calls 2. **Small File Size**: 50-80% smaller than MP3 3. **Voice Optimized**: Designed for speech 4. **Wide Platform Support**: Works on modern systems 5. **Future-proof**: Active development ## Changing Default Formats To change from PCM streaming to compressed formats: 1. Set environment variables: ```bash # For TTS streaming (consider latency impact) export VOICEMODE_TTS_AUDIO_FORMAT="opus" # For STT uploads (already uses compression by default) export VOICEMODE_STT_AUDIO_FORMAT="mp3" ``` 2. Or update your MCP configuration as shown above 3. Restart your MCP client PCM provides the best streaming performance, but compressed formats are useful for: - Reducing bandwidth usage - Saving audio files - STT uploads to cloud services

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mbailey/voicemode'

If you have feedback or need assistance with the MCP directory API, please join our Discord server