Speech MCP Server

A Model Context Protocol server that provides text-to-speech capabilities using the Kokoro TTS model.

Configuration

The server can be configured using the following environment variables:

Variable	Description	Default	Valid Range
`MCP_DEFAULT_SPEECH_SPEED`	Default speed multiplier for text-to-speech	1.1	0.5 to 2.0

In Cursor:

{
  "mcpServers": {
    "speech": {
      "command": "npx",
      "args": [
        "-y",
        "speech-mcp-server"
      ],
      "env": {
        MCP_DEFAULT_SPEECH_SPEED: 1.3
      }
    }
  }
}

Features

🎯 High-quality text-to-speech using Kokoro TTS model
🗣️ Multiple voice options available
🎛️ Customizable speech parameters (voice, speed)
🔌 MCP-compliant interface
📦 Easy installation and setup
🚀 No API key required

Installation

# Using npm
npm install speech-mcp-server

# Using pnpm (recommended)
pnpm add speech-mcp-server

# Using yarn
yarn add speech-mcp-server

Usage

Run the server:

# Using default configuration
npm start

# With custom speech speed
MCP_DEFAULT_SPEECH_SPEED=1.5 npm start

The server provides the following MCP tools:

text_to_speech: Basic text-to-speech conversion
text_to_speech_with_options: Text-to-speech with customizable speed
list_voices: List all available voices
get_model_status: Check the initialization status of the TTS model

Development

# Clone the repository
git clone <your-repo-url>
cd speech-mcp-server

# Install dependencies
pnpm install

# Start development server with auto-reload
pnpm dev

# Build the project
pnpm build

# Run linting
pnpm lint

# Format code
pnpm format

# Test with MCP Inspector
pnpm inspector

Available Tools

1. text_to_speech

Converts text to speech using the default settings.

{
  "type": "request",
  "id": "1",
  "method": "call_tool",
  "params": {
    "name": "text_to_speech",
    "arguments": {
      "text": "Hello world",
      "voice": "af_bella"  // optional
    }
  }
}

2. text_to_speech_with_options

Converts text to speech with customizable parameters.

{
  "type": "request",
  "id": "1",
  "method": "call_tool",
  "params": {
    "name": "text_to_speech_with_options",
    "arguments": {
      "text": "Hello world",
      "voice": "af_bella",  // optional
      "speed": 1.0,         // optional (0.5 to 2.0)
    }
  }
}

3. list_voices

Lists all available voices for text-to-speech.

{
  "type": "request",
  "id": "1",
  "method": "list_voices",
  "params": {}
}

4. get_model_status

Check the current status of the TTS model initialization. This is particularly useful when first starting the server, as the model needs to be downloaded and initialized.

{
  "type": "request",
  "id": "1",
  "method": "call_tool",
  "params": {
    "name": "get_model_status",
    "arguments": {}
  }
}

Response example:

{
  "content": [{
    "type": "text",
    "text": "Model status: initializing (5s elapsed)"
  }]
}

Possible status values:

uninitialized: Model initialization hasn't started
initializing: Model is being downloaded and initialized
ready: Model is ready to use
error: An error occurred during initialization

Testing

You can test the server using the MCP Inspector or by sending raw JSON messages:

# List available tools
echo '{"type":"request","id":"1","method":"list_tools","params":{}}' | node dist/index.js

# List available voices
echo '{"type":"request","id":"2","method":"list_voices","params":{}}' | node dist/index.js

# Convert text to speech
echo '{"type":"request","id":"3","method":"call_tool","params":{"name":"text_to_speech","arguments":{"text":"Hello world","voice":"af_bella"}}}' | node dist/index.js

Integration with Claude Desktop

To use this server with Claude Desktop, add the following to your Claude Desktop config file (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "servers": {
    "speech": {
      "command": "npx",
      "args": ["@decodershq/speech-mcp-server"]
    }
  }
}

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT License - see the LICENSE file for details.

Troubleshooting

Model Initialization Issues

The server automatically attempts to download and initialize the TTS model on startup. If you encounter initialization errors:

The server will automatically retry up to 3 times with a cleanup between attempts
Use the get_model_status tool to monitor initialization progress and any errors
If initialization fails after all retries, try manually removing the model files:

# Remove model files (MacOS/Linux)
rm -rf ~/.npm/_npx/**/node_modules/@huggingface/transformers/.cache/onnx-community/Kokoro-82M-v1.0-ONNX/onnx/model_quantized.onnx
rm -rf ~/.cache/huggingface/transformers/onnx-community/Kokoro-82M-v1.0-ONNX/onnx/model_quantized.onnx

# Then restart the server
npm start

The get_model_status tool will now include retry information in its response:

{
  "content": [{
    "type": "text",
    "text": "Model status: initializing (5s elapsed, retry 1/3)"
  }]
}

This server cannot be installed

security - not tested

license - not found

quality - not tested

How are these scores calculated?

remote-capable server

The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.

A Model Context Protocol server that provides text-to-speech capabilities using the Kokoro TTS model, offering multiple voice options and customizable speech parameters.

Related MCP Servers

Kokoro TTS MCP Server
giannisanni
-
security
F
license
-
quality
Provides text-to-speech capabilities through the Model Context Protocol, allowing applications to easily integrate speech synthesis with customizable voices, adjustable speech speed, and cross-platform audio playback support.
Last updated -
6
Python
TTS-MCP
nakamurau1
-
security
A
license
-
quality
A Model Context Protocol server that integrates high-quality text-to-speech capabilities with Claude Desktop and other MCP-compatible clients, supporting multiple voice options and audio formats.
Last updated -
14
1
TypeScript
MIT License
Gladia MCPofficial
gladiaio
-
security
A
license
-
quality
Official Model Context Protocol server that enables interaction with powerful Speech-to-Text and Audio Intelligence APIs, allowing clients like Claude Desktop to transcribe audio, analyze speech, translate content, and more.
Last updated -
2
Python
MIT License
AllVoiceLab-MCP
Ruxo0
-
security
A
license
-
quality
A Model Context Protocol server that enables developers to integrate advanced text-to-speech and video translation capabilities into their applications through simple API calls.
Last updated -
Python
MIT License

View all related MCP servers

Speech MCP Server