Utilizes FFmpeg for audio conversion, supporting multiple output formats (WAV, MP3, FLAC, OGG) for the synthesized speech
Offers Node.js implementation for running the TTS server, with TypeScript support for type-safe interactions
Uses ONNX runtime for the Kokoro neural voice models, providing high-quality text-to-speech synthesis with multiple voices and emotional expressions
Provides Python API for interacting with the TTS engine, supporting various speech synthesis operations and batch processing
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Advanced TTS MCP Serverread this welcome message in a friendly female voice with excited emotion"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Advanced TTS MCP Server
A high-quality, feature-rich Text-to-Speech MCP server with native TypeScript implementation. Designed for professional applications requiring natural, expressive speech synthesis with advanced controls and zero external dependencies.
β¨ Features
π― Advanced Voice Control
10 High-Quality Voices - Male and female voices with distinct personalities
Emotion Control - Neutral, happy, excited, calm, serious, casual, confident
Dynamic Pacing - Natural, conversational, presentation, tutorial, narrative modes
Speed & Volume - Precise control from 0.25x to 3.0x speed, 0.1x to 2.0x volume
π Professional Capabilities
Streaming Audio - Real-time synthesis and playback
Batch Processing - Handle multiple text segments efficiently
Multiple Formats - WAV, MP3, FLAC, OGG output support
Natural Speech Enhancement - Automatic pause insertion and emotion markers
Queue Management - Handle multiple concurrent requests
π§ MCP Integration
6 Powerful Tools - Complete synthesis, batch processing, voice management
2 Rich Resources - Voice capabilities and usage examples
Real-time Status - Track processing progress and manage requests
File Management - Save, list, and organize audio outputs
Related MCP server: Say MCP Server
π Quick Start
Option 1: Deploy to Smithery.ai (Recommended)
π― One-Click Deployment to Smithery Platform
Deploy Now: Visit Smithery.ai and import this repository
Configure: Set your preferred voice and speech settings
Use Instantly: Access via Claude Desktop or any MCP-compatible client
Benefits:
β Zero setup required
β Automatic scaling and updates
β No model downloads needed
β Enterprise-grade hosting
π Full Smithery Deployment Guide β
Option 2: Local Installation
Prerequisites:
Node.js 18+
Installation:
Clone the repository
git clone https://github.com/samihalawa/advanced-tts-mcp.git
cd advanced-tts-mcpInstall dependencies
npm installConfigure Claude Desktop
Add to your claude_desktop_config.json:
{
"mcpServers": {
"advanced-tts": {
"command": "node",
"args": ["dist/index.js"],
"cwd": "/path/to/advanced-tts-mcp"
}
}
}Start using!
# Build TypeScript
npm run build
# Start server
npm startRestart Claude Desktop and start synthesizing with natural, expressive voices.
ποΈ Available Voices
Voice ID | Name | Gender | Description |
| Heart | Female | Warm, friendly voice (default) |
| Sky | Female | Clear, bright voice |
| Bella | Female | Elegant, sophisticated voice |
| Sarah | Female | Professional, confident voice |
| Nicole | Female | Gentle, soothing voice |
| Adam | Male | Strong, authoritative voice |
| Michael | Male | Friendly, approachable voice |
| Emma | Female | Young, energetic voice |
| Isabella | Female | Mature, expressive voice |
| Lewis | Male | Deep, resonant voice |
π Usage Examples
Basic Synthesis
# Simple text-to-speech
await synthesize_speech(
text="Hello! Welcome to Advanced TTS.",
voice_id="af_heart"
)Emotional Expression
# Excited announcement
await synthesize_speech(
text="This is amazing news! You're going to love this new feature!",
voice_id="af_heart",
emotion="excited",
pacing="conversational",
speed=1.1
)Professional Presentation
# Tutorial narration
await synthesize_speech(
text="Step one: Open your browser. Step two: Navigate to the website.",
voice_id="am_adam",
emotion="calm",
pacing="tutorial",
speed=0.9
)Batch Processing
# Multiple segments with pauses
await batch_synthesize(
segments=[
"Welcome to our presentation.",
"Today we'll cover three main topics.",
"Let's begin with the first topic."
],
voice_id="af_sarah",
emotion="confident",
pacing="presentation",
merge_output=True,
segment_pause=1.0,
save_file=True
)π οΈ Available Tools
synthesize_speech
Convert text to natural speech with full control over voice characteristics.
Parameters:
text- Text to synthesize (max 10,000 chars)voice_id- Voice selection (see table above)speed- Speech rate (0.25-3.0)emotion- Voice emotion (neutral, happy, excited, calm, serious, casual, confident)pacing- Speech style (natural, conversational, presentation, tutorial, narrative, fast, slow)volume- Audio volume (0.1-2.0)output_format- File format (wav, mp3, flac, ogg)save_file- Save to file (boolean)filename- Custom filename
batch_synthesize
Process multiple text segments efficiently with optional merging.
Parameters:
segments- List of text segmentsmerge_output- Combine into single filesegment_pause- Pause between segments (0.0-5.0s)All synthesis parameters from above
get_voices
Retrieve complete voice information and capabilities.
get_status
Check processing status for synthesis requests.
cancel_request
Cancel active synthesis operations.
list_output_files
Browse saved audio files with metadata.
ποΈ Voice Controls
Emotions
Neutral - Standard, professional tone
Happy - Upbeat, cheerful expression
Excited - Enthusiastic, energetic delivery
Calm - Relaxed, soothing tone
Serious - Formal, authoritative delivery
Casual - Relaxed, conversational style
Confident - Assured, professional tone
Pacing Styles
Natural - Balanced, human-like rhythm
Conversational - Casual discussion pace
Presentation - Professional speaking rhythm
Tutorial - Educational, clear delivery
Narrative - Storytelling pace
Fast - Quick delivery (1.2x base speed)
Slow - Deliberate delivery (0.8x base speed)
π΅ Audio Formats
Format | Quality | Use Case |
WAV | Uncompressed | Highest quality, editing |
MP3 | Compressed | Web, streaming, sharing |
FLAC | Lossless | Archival, high-quality storage |
OGG | Compressed | Open source alternative |
π§ Configuration
Environment Variables
# Model paths (optional)
KOKORO_MODEL_PATH=./kokoro-v1.0.onnx
KOKORO_VOICES_PATH=./voices-v1.0.bin
# Output settings
TTS_OUTPUT_DIR=./audio_output
TTS_MAX_QUEUE_SIZE=100
# Audio settings
TTS_DEFAULT_VOICE=af_heart
TTS_ENABLE_STREAMING=trueServer Configuration
config = ServerConfig(
model_path="./kokoro-v1.0.onnx",
voices_path="./voices-v1.0.bin",
output_dir="./audio_output",
max_queue_size=100,
enable_streaming=True,
default_voice="af_heart"
)ποΈ Architecture
βββ src/advanced_tts/
β βββ __init__.py # Package initialization
β βββ server.py # MCP server implementation
β βββ engine.py # Kokoro TTS engine wrapper
β βββ models.py # Data models and validation
β βββ utils.py # Utility functions
βββ pyproject.toml # Project configuration
βββ README.md # Documentation
βββ LICENSE # MIT Licenseπ€ Contributing
Contributions welcome! Areas for improvement:
Additional voice models
Real-time streaming synthesis
Advanced audio effects
Multi-language support
Performance optimizations
π License
MIT License - see LICENSE for details.
π Acknowledgments
Kokoro TTS - High-quality neural voice synthesis
MCP Protocol - Seamless AI model integration
FastMCP - Efficient server framework
Developed by
Transform your text into natural, expressive speech with Advanced TTS MCP Server.