Voice Mode

MIT License

401

Overview InspectNew Endpoints Schema Related Servers Reviews Score

provider-endpoint-lists.md•6.37 kB

# Provider Base URL Lists Specification ## Overview This specification describes a system for configuring multiple TTS and STT base URLs as comma-separated lists, with automatic discovery, failover, and provider auto-detection. ## Design Principles 1. **OpenAI API Compatibility**: All endpoints must be OpenAI API-compatible 2. **Graceful Degradation**: Handle missing endpoints gracefully 3. **Priority-Based Selection**: Use URLs in the order specified by the user 4. **Transparent to LLM**: The LLM doesn't need to know which provider is being used ## Environment Variables ### Core Configuration (No Backward Compatibility) ```bash # Comma-separated list of TTS base URLs (tried in order) VOICEMODE_TTS_BASE_URLS=http://127.0.0.1:8880/v1,https://api.openai.com/v1 # Comma-separated list of STT base URLs (tried in order) VOICEMODE_STT_BASE_URLS=http://127.0.0.1:2022/v1,https://api.openai.com/v1 # Comma-separated list of preferred TTS voices (tried in order of availability) VOICEMODE_VOICES=af_sky,nova,alloy # Comma-separated list of preferred TTS models (optional) VOICEMODE_TTS_MODELS=tts-1,gpt-4o-mini-tts # API key for authentication (required) OPENAI_API_KEY=sk-... ``` ## Discovery Process ### On Startup 1. **Iterate through each base URL** in `VOICEMODE_TTS_BASE_URLS` and `VOICEMODE_STT_BASE_URLS` 2. **Health Check**: Verify endpoint is reachable 3. **Model Discovery**: Query `/v1/models` endpoint 4. **Voice Discovery** (TTS only): - If URL contains "openai.com" → assume OpenAI voices: `["alloy", "echo", "fable", "nova", "onyx", "shimmer"]` - Otherwise → try `/v1/audio/voices` (Kokoro endpoint) - If voices endpoint fails but health check passes → assume OpenAI voices 5. **Build Registry**: Store discovered capabilities for runtime use ### Voice Discovery Logic ```python async def discover_voices(base_url: str, client: AsyncOpenAI) -> List[str]: """Discover available voices for a TTS endpoint.""" # OpenAI doesn't have a voices endpoint, use known list if "openai.com" in base_url: return ["alloy", "echo", "fable", "nova", "onyx", "shimmer"] # Try Kokoro-style voices endpoint try: response = await client.get("/v1/audio/voices") return response.json()["voices"] except: # If endpoint doesn't exist but server is healthy, assume OpenAI voices return ["alloy", "echo", "fable", "nova", "onyx", "shimmer"] ``` ## Registry Structure The registry stores discovered capabilities for each base URL: ```json { "tts": { "http://127.0.0.1:8880/v1": { "healthy": true, "models": ["tts-1"], "voices": ["af_sky", "af_sarah", "am_adam", "af_nicole", "am_michael"], "last_health_check": "2024-01-20T10:30:00Z", "response_time_ms": 45 }, "https://api.openai.com/v1": { "healthy": true, "models": ["tts-1", "tts-1-hd", "gpt-4o-mini-tts"], "voices": ["alloy", "echo", "fable", "nova", "onyx", "shimmer"], "last_health_check": "2024-01-20T10:30:00Z", "response_time_ms": 120 } }, "stt": { "http://127.0.0.1:2022/v1": { "healthy": true, "models": ["whisper-1"], "last_health_check": "2024-01-20T10:30:00Z", "response_time_ms": 30 } } } ``` ## Selection Algorithm When a TTS request is made: 1. **Iterate through healthy endpoints** in the order specified by `VOICEMODE_TTS_BASE_URLS` 2. **Find first endpoint** that supports the requested voice (or first preferred voice) 3. **Use that endpoint** for the request ### Selection Priority 1. User-specified voice/model/provider (if provided) 2. First available voice from `VOICEMODE_VOICES` 3. First available model from `VOICEMODE_TTS_MODELS` 4. First healthy endpoint from `VOICEMODE_TTS_BASE_URLS` ### Example Selection Given: ```bash VOICEMODE_TTS_BASE_URLS=http://127.0.0.1:8880/v1,https://api.openai.com/v1 VOICEMODE_VOICES=af_sky,nova,alloy ``` If 127.0.0.1:8880 is healthy and has `af_sky`, use it. Otherwise, check if OpenAI has `nova` or `alloy`. ## Registry Updates ### When to Update 1. **On startup**: Full discovery of all endpoints 2. **On request failure**: Health check the failed endpoint 3. **Manual refresh**: Via MCP tool/command 4. **No periodic refresh**: Not needed for typical use ### Failure Handling When a request fails: 1. Mark endpoint as unhealthy in registry 2. Retry with next available endpoint 3. Run health check on failed endpoint 4. Update registry based on health check result ## LLM Integration The LLM can query the registry to see available options: ```python async def get_voice_registry() -> Dict: """Return the current provider registry for LLM inspection.""" return { "tts": { url: { "healthy": info["healthy"], "models": info["models"], "voices": info["voices"], "response_time_ms": info["response_time_ms"] } for url, info in registry["tts"].items() }, "stt": { url: { "healthy": info["healthy"], "models": info["models"], "response_time_ms": info["response_time_ms"] } for url, info in registry["stt"].items() } } ``` ## Configuration Examples ### Minimal Configuration ```bash # Only API key required - defaults to OpenAI OPENAI_API_KEY=sk-... ``` ### Local Development ```bash VOICEMODE_TTS_BASE_URLS=http://127.0.0.1:8880/v1,https://api.openai.com/v1 VOICEMODE_STT_BASE_URLS=http://127.0.0.1:2022/v1,https://api.openai.com/v1 VOICEMODE_VOICES=af_sky,nova,alloy OPENAI_API_KEY=sk-... ``` ### Production with Fallback ```bash VOICEMODE_TTS_BASE_URLS=http://tts-prod.internal/v1,http://tts-backup.internal/v1,https://api.openai.com/v1 VOICEMODE_STT_BASE_URLS=http://stt-prod.internal/v1,https://api.openai.com/v1 VOICEMODE_VOICES=nova,alloy,echo VOICEMODE_TTS_MODELS=gpt-4o-mini-tts,tts-1-hd,tts-1 OPENAI_API_KEY=sk-... ``` ## Implementation Notes 1. **Remove all legacy environment variables** (TTS_BASE_URL, STT_BASE_URL, etc.) 2. **No provider-specific code** - everything uses OpenAI API 3. **Graceful fallback** - if primary fails, try next URL 4. **Fast selection** - use pre-discovered registry, no discovery during requests 5. **Simple configuration** - just list URLs and preferences

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mbailey/voicemode'

If you have feedback or need assistance with the MCP directory API, please join our Discord server