Utilizes FFmpeg for audio conversion, supporting multiple output formats (WAV, MP3, FLAC, OGG) for the synthesized speech
Offers Node.js implementation for running the TTS server, with TypeScript support for type-safe interactions
Uses ONNX runtime for the Kokoro neural voice models, providing high-quality text-to-speech synthesis with multiple voices and emotional expressions
Provides Python API for interacting with the TTS engine, supporting various speech synthesis operations and batch processing
Advanced TTS MCP Server
A high-quality, feature-rich Text-to-Speech MCP server with native TypeScript implementation. Designed for professional applications requiring natural, expressive speech synthesis with advanced controls and zero external dependencies.
✨ Features
🎯 Advanced Voice Control
- 10 High-Quality Voices - Male and female voices with distinct personalities
- Emotion Control - Neutral, happy, excited, calm, serious, casual, confident
- Dynamic Pacing - Natural, conversational, presentation, tutorial, narrative modes
- Speed & Volume - Precise control from 0.25x to 3.0x speed, 0.1x to 2.0x volume
🚀 Professional Capabilities
- Streaming Audio - Real-time synthesis and playback
- Batch Processing - Handle multiple text segments efficiently
- Multiple Formats - WAV, MP3, FLAC, OGG output support
- Natural Speech Enhancement - Automatic pause insertion and emotion markers
- Queue Management - Handle multiple concurrent requests
🔧 MCP Integration
- 6 Powerful Tools - Complete synthesis, batch processing, voice management
- 2 Rich Resources - Voice capabilities and usage examples
- Real-time Status - Track processing progress and manage requests
- File Management - Save, list, and organize audio outputs
🚀 Quick Start
Option 1: Deploy to Smithery.ai (Recommended)
🎯 One-Click Deployment to Smithery Platform
- Deploy Now: Visit Smithery.ai and import this repository
- Configure: Set your preferred voice and speech settings
- Use Instantly: Access via Claude Desktop or any MCP-compatible client
Benefits:
- ✅ Zero setup required
- ✅ Automatic scaling and updates
- ✅ No model downloads needed
- ✅ Enterprise-grade hosting
📋 Full Smithery Deployment Guide →
Option 2: Local Installation
Prerequisites:
- Node.js 18+
Installation:
- Clone the repository
- Install dependencies
- Configure Claude Desktop
Add to your claude_desktop_config.json
:
- Start using!
Restart Claude Desktop and start synthesizing with natural, expressive voices.
🎙️ Available Voices
Voice ID | Name | Gender | Description |
---|---|---|---|
af_heart | Heart | Female | Warm, friendly voice (default) |
af_sky | Sky | Female | Clear, bright voice |
af_bella | Bella | Female | Elegant, sophisticated voice |
af_sarah | Sarah | Female | Professional, confident voice |
af_nicole | Nicole | Female | Gentle, soothing voice |
am_adam | Adam | Male | Strong, authoritative voice |
am_michael | Michael | Male | Friendly, approachable voice |
bf_emma | Emma | Female | Young, energetic voice |
bf_isabella | Isabella | Female | Mature, expressive voice |
bm_lewis | Lewis | Male | Deep, resonant voice |
📚 Usage Examples
Basic Synthesis
Emotional Expression
Professional Presentation
Batch Processing
🛠️ Available Tools
synthesize_speech
Convert text to natural speech with full control over voice characteristics.
Parameters:
text
- Text to synthesize (max 10,000 chars)voice_id
- Voice selection (see table above)speed
- Speech rate (0.25-3.0)emotion
- Voice emotion (neutral, happy, excited, calm, serious, casual, confident)pacing
- Speech style (natural, conversational, presentation, tutorial, narrative, fast, slow)volume
- Audio volume (0.1-2.0)output_format
- File format (wav, mp3, flac, ogg)save_file
- Save to file (boolean)filename
- Custom filename
batch_synthesize
Process multiple text segments efficiently with optional merging.
Parameters:
segments
- List of text segmentsmerge_output
- Combine into single filesegment_pause
- Pause between segments (0.0-5.0s)- All synthesis parameters from above
get_voices
Retrieve complete voice information and capabilities.
get_status
Check processing status for synthesis requests.
cancel_request
Cancel active synthesis operations.
list_output_files
Browse saved audio files with metadata.
🎛️ Voice Controls
Emotions
- Neutral - Standard, professional tone
- Happy - Upbeat, cheerful expression
- Excited - Enthusiastic, energetic delivery
- Calm - Relaxed, soothing tone
- Serious - Formal, authoritative delivery
- Casual - Relaxed, conversational style
- Confident - Assured, professional tone
Pacing Styles
- Natural - Balanced, human-like rhythm
- Conversational - Casual discussion pace
- Presentation - Professional speaking rhythm
- Tutorial - Educational, clear delivery
- Narrative - Storytelling pace
- Fast - Quick delivery (1.2x base speed)
- Slow - Deliberate delivery (0.8x base speed)
🎵 Audio Formats
Format | Quality | Use Case |
---|---|---|
WAV | Uncompressed | Highest quality, editing |
MP3 | Compressed | Web, streaming, sharing |
FLAC | Lossless | Archival, high-quality storage |
OGG | Compressed | Open source alternative |
🔧 Configuration
Environment Variables
Server Configuration
🏗️ Architecture
🤝 Contributing
Contributions welcome! Areas for improvement:
- Additional voice models
- Real-time streaming synthesis
- Advanced audio effects
- Multi-language support
- Performance optimizations
📄 License
MIT License - see LICENSE for details.
🙏 Acknowledgments
- Kokoro TTS - High-quality neural voice synthesis
- MCP Protocol - Seamless AI model integration
- FastMCP - Efficient server framework
Developed by Sami Halawa
Transform your text into natural, expressive speech with Advanced TTS MCP Server.
This server cannot be installed
hybrid server
The server is able to function both locally and remotely, depending on the configuration or use case.
Provides high-quality text-to-speech synthesis with 10 natural voices, emotion control, and dynamic pacing for professional applications requiring expressive speech output.
Related MCP Servers
- AsecurityAlicenseAqualityHelps refine AI-generated content to sound more natural and human-like. Built with advanced AI detection and text enhancement capabilities.Last updated -1478JavaScriptMIT License
- AsecurityAlicenseAqualityEnables text-to-speech functionality on macOS using the say command, offering extensive control over speech parameters like voice, rate, volume, and pitch for a customizable auditory experience.Last updated -2711JavaScriptMIT License
- -securityFlicense-qualityA server providing text-to-speech and speech-to-text functionalities using Windows' native speech services without external dependencies.Last updated -4JavaScript
- -securityFlicense-qualityProvides text-to-speech capabilities through the Model Context Protocol, allowing applications to easily integrate speech synthesis with customizable voices, adjustable speech speed, and cross-platform audio playback support.Last updated -2Python