Skip to main content
Glama

Speech MCP Server

by hammeiam

Speech MCP Server

A Model Context Protocol server that provides text-to-speech capabilities using the Kokoro TTS model.

Configuration

The server can be configured using the following environment variables:

VariableDescriptionDefaultValid Range
MCP_DEFAULT_SPEECH_SPEEDDefault speed multiplier for text-to-speech1.10.5 to 2.0

In Cursor:

{ "mcpServers": { "speech": { "command": "npx", "args": [ "-y", "speech-mcp-server" ], "env": { MCP_DEFAULT_SPEECH_SPEED: 1.3 } } } }

Features

  • 🎯 High-quality text-to-speech using Kokoro TTS model
  • 🗣️ Multiple voice options available
  • 🎛️ Customizable speech parameters (voice, speed)
  • 🔌 MCP-compliant interface
  • 📦 Easy installation and setup
  • 🚀 No API key required

Installation

# Using npm npm install speech-mcp-server # Using pnpm (recommended) pnpm add speech-mcp-server # Using yarn yarn add speech-mcp-server

Usage

Run the server:

# Using default configuration npm start # With custom speech speed MCP_DEFAULT_SPEECH_SPEED=1.5 npm start

The server provides the following MCP tools:

  • text_to_speech: Basic text-to-speech conversion
  • text_to_speech_with_options: Text-to-speech with customizable speed
  • list_voices: List all available voices
  • get_model_status: Check the initialization status of the TTS model

Development

# Clone the repository git clone <your-repo-url> cd speech-mcp-server # Install dependencies pnpm install # Start development server with auto-reload pnpm dev # Build the project pnpm build # Run linting pnpm lint # Format code pnpm format # Test with MCP Inspector pnpm inspector

Available Tools

1. text_to_speech

Converts text to speech using the default settings.

{ "type": "request", "id": "1", "method": "call_tool", "params": { "name": "text_to_speech", "arguments": { "text": "Hello world", "voice": "af_bella" // optional } } }

2. text_to_speech_with_options

Converts text to speech with customizable parameters.

{ "type": "request", "id": "1", "method": "call_tool", "params": { "name": "text_to_speech_with_options", "arguments": { "text": "Hello world", "voice": "af_bella", // optional "speed": 1.0, // optional (0.5 to 2.0) } } }

3. list_voices

Lists all available voices for text-to-speech.

{ "type": "request", "id": "1", "method": "list_voices", "params": {} }

4. get_model_status

Check the current status of the TTS model initialization. This is particularly useful when first starting the server, as the model needs to be downloaded and initialized.

{ "type": "request", "id": "1", "method": "call_tool", "params": { "name": "get_model_status", "arguments": {} } }

Response example:

{ "content": [{ "type": "text", "text": "Model status: initializing (5s elapsed)" }] }

Possible status values:

  • uninitialized: Model initialization hasn't started
  • initializing: Model is being downloaded and initialized
  • ready: Model is ready to use
  • error: An error occurred during initialization

Testing

You can test the server using the MCP Inspector or by sending raw JSON messages:

# List available tools echo '{"type":"request","id":"1","method":"list_tools","params":{}}' | node dist/index.js # List available voices echo '{"type":"request","id":"2","method":"list_voices","params":{}}' | node dist/index.js # Convert text to speech echo '{"type":"request","id":"3","method":"call_tool","params":{"name":"text_to_speech","arguments":{"text":"Hello world","voice":"af_bella"}}}' | node dist/index.js

Integration with Claude Desktop

To use this server with Claude Desktop, add the following to your Claude Desktop config file (~/Library/Application Support/Claude/claude_desktop_config.json):

{ "servers": { "speech": { "command": "npx", "args": ["@decodershq/speech-mcp-server"] } } }

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT License - see the LICENSE file for details.

Troubleshooting

Model Initialization Issues

The server automatically attempts to download and initialize the TTS model on startup. If you encounter initialization errors:

  1. The server will automatically retry up to 3 times with a cleanup between attempts
  2. Use the get_model_status tool to monitor initialization progress and any errors
  3. If initialization fails after all retries, try manually removing the model files:
# Remove model files (MacOS/Linux) rm -rf ~/.npm/_npx/**/node_modules/@huggingface/transformers/.cache/onnx-community/Kokoro-82M-v1.0-ONNX/onnx/model_quantized.onnx rm -rf ~/.cache/huggingface/transformers/onnx-community/Kokoro-82M-v1.0-ONNX/onnx/model_quantized.onnx # Then restart the server npm start

The get_model_status tool will now include retry information in its response:

{ "content": [{ "type": "text", "text": "Model status: initializing (5s elapsed, retry 1/3)" }] }
-
security - not tested
F
license - not found
-
quality - not tested

remote-capable server

The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.

A Model Context Protocol server that provides text-to-speech capabilities using the Kokoro TTS model, offering multiple voice options and customizable speech parameters.

  1. Configuration
    1. Features
      1. Installation
        1. Usage
          1. Development
        2. Available Tools
          1. 1. text_to_speech
          2. 2. text_to_speech_with_options
          3. 3. list_voices
          4. 4. get_model_status
        3. Testing
          1. Integration with Claude Desktop
            1. Contributing
              1. License
                1. Troubleshooting
                  1. Model Initialization Issues

                Related MCP Servers

                • -
                  security
                  F
                  license
                  -
                  quality
                  Provides text-to-speech capabilities through the Model Context Protocol, allowing applications to easily integrate speech synthesis with customizable voices, adjustable speech speed, and cross-platform audio playback support.
                  Last updated -
                  6
                  Python
                • -
                  security
                  A
                  license
                  -
                  quality
                  A Model Context Protocol server that integrates high-quality text-to-speech capabilities with Claude Desktop and other MCP-compatible clients, supporting multiple voice options and audio formats.
                  Last updated -
                  14
                  1
                  TypeScript
                  MIT License
                • -
                  security
                  A
                  license
                  -
                  quality
                  Official Model Context Protocol server that enables interaction with powerful Speech-to-Text and Audio Intelligence APIs, allowing clients like Claude Desktop to transcribe audio, analyze speech, translate content, and more.
                  Last updated -
                  2
                  Python
                  MIT License
                • -
                  security
                  A
                  license
                  -
                  quality
                  A Model Context Protocol server that enables developers to integrate advanced text-to-speech and video translation capabilities into their applications through simple API calls.
                  Last updated -
                  Python
                  MIT License

                View all related MCP servers

                MCP directory API

                We provide all the information about MCP servers via our MCP API.

                curl -X GET 'https://glama.ai/api/mcp/v1/servers/hammeiam/koroko-speech-mcp'

                If you have feedback or need assistance with the MCP directory API, please join our Discord server