Uses the ONNX runtime to run the Kokoro TTS model, enabling high-quality text-to-speech conversion without requiring an API key.
Speech MCP Server
A Model Context Protocol server that provides text-to-speech capabilities using the Kokoro TTS model.
Configuration
The server can be configured using the following environment variables:
Variable | Description | Default | Valid Range |
---|---|---|---|
MCP_DEFAULT_SPEECH_SPEED | Default speed multiplier for text-to-speech | 1.1 | 0.5 to 2.0 |
In Cursor:
Features
- 🎯 High-quality text-to-speech using Kokoro TTS model
- 🗣️ Multiple voice options available
- 🎛️ Customizable speech parameters (voice, speed)
- 🔌 MCP-compliant interface
- 📦 Easy installation and setup
- 🚀 No API key required
Installation
Usage
Run the server:
The server provides the following MCP tools:
text_to_speech
: Basic text-to-speech conversiontext_to_speech_with_options
: Text-to-speech with customizable speedlist_voices
: List all available voicesget_model_status
: Check the initialization status of the TTS model
Development
Available Tools
1. text_to_speech
Converts text to speech using the default settings.
2. text_to_speech_with_options
Converts text to speech with customizable parameters.
3. list_voices
Lists all available voices for text-to-speech.
4. get_model_status
Check the current status of the TTS model initialization. This is particularly useful when first starting the server, as the model needs to be downloaded and initialized.
Response example:
Possible status values:
uninitialized
: Model initialization hasn't startedinitializing
: Model is being downloaded and initializedready
: Model is ready to useerror
: An error occurred during initialization
Testing
You can test the server using the MCP Inspector or by sending raw JSON messages:
Integration with Claude Desktop
To use this server with Claude Desktop, add the following to your Claude Desktop config file (~/Library/Application Support/Claude/claude_desktop_config.json
):
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
MIT License - see the LICENSE file for details.
Troubleshooting
Model Initialization Issues
The server automatically attempts to download and initialize the TTS model on startup. If you encounter initialization errors:
- The server will automatically retry up to 3 times with a cleanup between attempts
- Use the
get_model_status
tool to monitor initialization progress and any errors - If initialization fails after all retries, try manually removing the model files:
The get_model_status
tool will now include retry information in its response:
This server cannot be installed
remote-capable server
The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.
A Model Context Protocol server that provides text-to-speech capabilities using the Kokoro TTS model, offering multiple voice options and customizable speech parameters.
Related MCP Servers
- -securityFlicense-qualityProvides text-to-speech capabilities through the Model Context Protocol, allowing applications to easily integrate speech synthesis with customizable voices, adjustable speech speed, and cross-platform audio playback support.Last updated -6Python
- -securityAlicense-qualityA Model Context Protocol server that integrates high-quality text-to-speech capabilities with Claude Desktop and other MCP-compatible clients, supporting multiple voice options and audio formats.Last updated -141TypeScriptMIT License
Gladia MCPofficial
-securityAlicense-qualityOfficial Model Context Protocol server that enables interaction with powerful Speech-to-Text and Audio Intelligence APIs, allowing clients like Claude Desktop to transcribe audio, analyze speech, translate content, and more.Last updated -2PythonMIT License- -securityAlicense-qualityA Model Context Protocol server that enables developers to integrate advanced text-to-speech and video translation capabilities into their applications through simple API calls.Last updated -PythonMIT License
Appeared in Searches
- A platform providing TTS (Text-to-Speech) capabilities
- A service to convert text to ready-to-use audio with download, player, or embed options
- Using Hugging Face for Text-to-Audio, Image, and Video Generation
- A search for translation services or tools
- A system for retrieving medical knowledge, especially in medical imaging, and generating reports