Advanced TTS MCP Server
Utilizes FFmpeg for audio conversion, supporting multiple output formats (WAV, MP3, FLAC, OGG) for the synthesized speech
Offers Node.js implementation for running the TTS server, with TypeScript support for type-safe interactions
Uses ONNX runtime for the Kokoro neural voice models, providing high-quality text-to-speech synthesis with multiple voices and emotional expressions
Provides Python API for interacting with the TTS engine, supporting various speech synthesis operations and batch processing
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Advanced TTS MCP Serverread this welcome message in a friendly female voice with excited emotion"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Advanced TTS MCP Server
A high-quality, feature-rich Text-to-Speech MCP server with native TypeScript implementation. Designed for professional applications requiring natural, expressive speech synthesis with advanced controls and zero external dependencies.
β¨ Features
π― Advanced Voice Control
10 High-Quality Voices - Male and female voices with distinct personalities
Emotion Control - Neutral, happy, excited, calm, serious, casual, confident
Dynamic Pacing - Natural, conversational, presentation, tutorial, narrative modes
Speed & Volume - Precise control from 0.25x to 3.0x speed, 0.1x to 2.0x volume
π Professional Capabilities
Streaming Audio - Real-time synthesis and playback
Batch Processing - Handle multiple text segments efficiently
Multiple Formats - WAV, MP3, FLAC, OGG output support
Natural Speech Enhancement - Automatic pause insertion and emotion markers
Queue Management - Handle multiple concurrent requests
π§ MCP Integration
6 Powerful Tools - Complete synthesis, batch processing, voice management
2 Rich Resources - Voice capabilities and usage examples
Real-time Status - Track processing progress and manage requests
File Management - Save, list, and organize audio outputs
Related MCP server: Say MCP Server
π Quick Start
Option 1: Deploy to Smithery.ai (Recommended)
π― One-Click Deployment to Smithery Platform
Deploy Now: Visit Smithery.ai and import this repository
Configure: Set your preferred voice and speech settings
Use Instantly: Access via Claude Desktop or any MCP-compatible client
Benefits:
β Zero setup required
β Automatic scaling and updates
β No model downloads needed
β Enterprise-grade hosting
π Full Smithery Deployment Guide β
Option 2: Local Installation
Prerequisites:
Node.js 18+
Installation:
Clone the repository
git clone https://github.com/samihalawa/advanced-tts-mcp.git
cd advanced-tts-mcpInstall dependencies
npm installConfigure Claude Desktop
Add to your claude_desktop_config.json:
{
"mcpServers": {
"advanced-tts": {
"command": "node",
"args": ["dist/index.js"],
"cwd": "/path/to/advanced-tts-mcp"
}
}
}Start using!
# Build TypeScript
npm run build
# Start server
npm startRestart Claude Desktop and start synthesizing with natural, expressive voices.
ποΈ Available Voices
Voice ID | Name | Gender | Description |
| Heart | Female | Warm, friendly voice (default) |
| Sky | Female | Clear, bright voice |
| Bella | Female | Elegant, sophisticated voice |
| Sarah | Female | Professional, confident voice |
| Nicole | Female | Gentle, soothing voice |
| Adam | Male | Strong, authoritative voice |
| Michael | Male | Friendly, approachable voice |
| Emma | Female | Young, energetic voice |
| Isabella | Female | Mature, expressive voice |
| Lewis | Male | Deep, resonant voice |
π Usage Examples
Basic Synthesis
# Simple text-to-speech
await synthesize_speech(
text="Hello! Welcome to Advanced TTS.",
voice_id="af_heart"
)Emotional Expression
# Excited announcement
await synthesize_speech(
text="This is amazing news! You're going to love this new feature!",
voice_id="af_heart",
emotion="excited",
pacing="conversational",
speed=1.1
)Professional Presentation
# Tutorial narration
await synthesize_speech(
text="Step one: Open your browser. Step two: Navigate to the website.",
voice_id="am_adam",
emotion="calm",
pacing="tutorial",
speed=0.9
)Batch Processing
# Multiple segments with pauses
await batch_synthesize(
segments=[
"Welcome to our presentation.",
"Today we'll cover three main topics.",
"Let's begin with the first topic."
],
voice_id="af_sarah",
emotion="confident",
pacing="presentation",
merge_output=True,
segment_pause=1.0,
save_file=True
)π οΈ Available Tools
synthesize_speech
Convert text to natural speech with full control over voice characteristics.
Parameters:
text- Text to synthesize (max 10,000 chars)voice_id- Voice selection (see table above)speed- Speech rate (0.25-3.0)emotion- Voice emotion (neutral, happy, excited, calm, serious, casual, confident)pacing- Speech style (natural, conversational, presentation, tutorial, narrative, fast, slow)volume- Audio volume (0.1-2.0)output_format- File format (wav, mp3, flac, ogg)save_file- Save to file (boolean)filename- Custom filename
batch_synthesize
Process multiple text segments efficiently with optional merging.
Parameters:
segments- List of text segmentsmerge_output- Combine into single filesegment_pause- Pause between segments (0.0-5.0s)All synthesis parameters from above
get_voices
Retrieve complete voice information and capabilities.
get_status
Check processing status for synthesis requests.
cancel_request
Cancel active synthesis operations.
list_output_files
Browse saved audio files with metadata.
ποΈ Voice Controls
Emotions
Neutral - Standard, professional tone
Happy - Upbeat, cheerful expression
Excited - Enthusiastic, energetic delivery
Calm - Relaxed, soothing tone
Serious - Formal, authoritative delivery
Casual - Relaxed, conversational style
Confident - Assured, professional tone
Pacing Styles
Natural - Balanced, human-like rhythm
Conversational - Casual discussion pace
Presentation - Professional speaking rhythm
Tutorial - Educational, clear delivery
Narrative - Storytelling pace
Fast - Quick delivery (1.2x base speed)
Slow - Deliberate delivery (0.8x base speed)
π΅ Audio Formats
Format | Quality | Use Case |
WAV | Uncompressed | Highest quality, editing |
MP3 | Compressed | Web, streaming, sharing |
FLAC | Lossless | Archival, high-quality storage |
OGG | Compressed | Open source alternative |
π§ Configuration
Environment Variables
# Model paths (optional)
KOKORO_MODEL_PATH=./kokoro-v1.0.onnx
KOKORO_VOICES_PATH=./voices-v1.0.bin
# Output settings
TTS_OUTPUT_DIR=./audio_output
TTS_MAX_QUEUE_SIZE=100
# Audio settings
TTS_DEFAULT_VOICE=af_heart
TTS_ENABLE_STREAMING=trueServer Configuration
config = ServerConfig(
model_path="./kokoro-v1.0.onnx",
voices_path="./voices-v1.0.bin",
output_dir="./audio_output",
max_queue_size=100,
enable_streaming=True,
default_voice="af_heart"
)ποΈ Architecture
βββ src/advanced_tts/
β βββ __init__.py # Package initialization
β βββ server.py # MCP server implementation
β βββ engine.py # Kokoro TTS engine wrapper
β βββ models.py # Data models and validation
β βββ utils.py # Utility functions
βββ pyproject.toml # Project configuration
βββ README.md # Documentation
βββ LICENSE # MIT Licenseπ€ Contributing
Contributions welcome! Areas for improvement:
Additional voice models
Real-time streaming synthesis
Advanced audio effects
Multi-language support
Performance optimizations
π License
MIT License - see LICENSE for details.
π Acknowledgments
Kokoro TTS - High-quality neural voice synthesis
MCP Protocol - Seamless AI model integration
FastMCP - Efficient server framework
Developed by Sami Halawa
Transform your text into natural, expressive speech with Advanced TTS MCP Server.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/samihalawa/advanced-tts-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server