Skip to main content
Glama
README.md•8.2 kB
# Advanced TTS MCP Server A high-quality, feature-rich Text-to-Speech MCP server with native TypeScript implementation. Designed for professional applications requiring natural, expressive speech synthesis with advanced controls and zero external dependencies. ## ✨ Features ### šŸŽÆ **Advanced Voice Control** - **10 High-Quality Voices** - Male and female voices with distinct personalities - **Emotion Control** - Neutral, happy, excited, calm, serious, casual, confident - **Dynamic Pacing** - Natural, conversational, presentation, tutorial, narrative modes - **Speed & Volume** - Precise control from 0.25x to 3.0x speed, 0.1x to 2.0x volume ### šŸš€ **Professional Capabilities** - **Streaming Audio** - Real-time synthesis and playback - **Batch Processing** - Handle multiple text segments efficiently - **Multiple Formats** - WAV, MP3, FLAC, OGG output support - **Natural Speech Enhancement** - Automatic pause insertion and emotion markers - **Queue Management** - Handle multiple concurrent requests ### šŸ”§ **MCP Integration** - **6 Powerful Tools** - Complete synthesis, batch processing, voice management - **2 Rich Resources** - Voice capabilities and usage examples - **Real-time Status** - Track processing progress and manage requests - **File Management** - Save, list, and organize audio outputs ## šŸš€ Quick Start ### Option 1: Deploy to Smithery.ai (Recommended) **šŸŽÆ One-Click Deployment to Smithery Platform** 1. **Deploy Now**: Visit [Smithery.ai](https://smithery.ai) and import this repository 2. **Configure**: Set your preferred voice and speech settings 3. **Use Instantly**: Access via Claude Desktop or any MCP-compatible client **Benefits:** - āœ… Zero setup required - āœ… Automatic scaling and updates - āœ… No model downloads needed - āœ… Enterprise-grade hosting **[šŸ“‹ Full Smithery Deployment Guide →](DEPLOYMENT.md)** ### Option 2: Local Installation **Prerequisites:** - Node.js 18+ **Installation:** 1. **Clone the repository** ```bash git clone https://github.com/samihalawa/advanced-tts-mcp.git cd advanced-tts-mcp ``` 2. **Install dependencies** ```bash npm install ``` 3. **Configure Claude Desktop** Add to your `claude_desktop_config.json`: ```json { "mcpServers": { "advanced-tts": { "command": "node", "args": ["dist/index.js"], "cwd": "/path/to/advanced-tts-mcp" } } } ``` 4. **Start using!** ```bash # Build TypeScript npm run build # Start server npm start ``` Restart Claude Desktop and start synthesizing with natural, expressive voices. ## šŸŽ™ļø Available Voices | Voice ID | Name | Gender | Description | |----------|------|--------|-------------| | `af_heart` | Heart | Female | Warm, friendly voice (default) | | `af_sky` | Sky | Female | Clear, bright voice | | `af_bella` | Bella | Female | Elegant, sophisticated voice | | `af_sarah` | Sarah | Female | Professional, confident voice | | `af_nicole` | Nicole | Female | Gentle, soothing voice | | `am_adam` | Adam | Male | Strong, authoritative voice | | `am_michael` | Michael | Male | Friendly, approachable voice | | `bf_emma` | Emma | Female | Young, energetic voice | | `bf_isabella` | Isabella | Female | Mature, expressive voice | | `bm_lewis` | Lewis | Male | Deep, resonant voice | ## šŸ“š Usage Examples ### Basic Synthesis ```python # Simple text-to-speech await synthesize_speech( text="Hello! Welcome to Advanced TTS.", voice_id="af_heart" ) ``` ### Emotional Expression ```python # Excited announcement await synthesize_speech( text="This is amazing news! You're going to love this new feature!", voice_id="af_heart", emotion="excited", pacing="conversational", speed=1.1 ) ``` ### Professional Presentation ```python # Tutorial narration await synthesize_speech( text="Step one: Open your browser. Step two: Navigate to the website.", voice_id="am_adam", emotion="calm", pacing="tutorial", speed=0.9 ) ``` ### Batch Processing ```python # Multiple segments with pauses await batch_synthesize( segments=[ "Welcome to our presentation.", "Today we'll cover three main topics.", "Let's begin with the first topic." ], voice_id="af_sarah", emotion="confident", pacing="presentation", merge_output=True, segment_pause=1.0, save_file=True ) ``` ## šŸ› ļø Available Tools ### `synthesize_speech` Convert text to natural speech with full control over voice characteristics. **Parameters:** - `text` - Text to synthesize (max 10,000 chars) - `voice_id` - Voice selection (see table above) - `speed` - Speech rate (0.25-3.0) - `emotion` - Voice emotion (neutral, happy, excited, calm, serious, casual, confident) - `pacing` - Speech style (natural, conversational, presentation, tutorial, narrative, fast, slow) - `volume` - Audio volume (0.1-2.0) - `output_format` - File format (wav, mp3, flac, ogg) - `save_file` - Save to file (boolean) - `filename` - Custom filename ### `batch_synthesize` Process multiple text segments efficiently with optional merging. **Parameters:** - `segments` - List of text segments - `merge_output` - Combine into single file - `segment_pause` - Pause between segments (0.0-5.0s) - All synthesis parameters from above ### `get_voices` Retrieve complete voice information and capabilities. ### `get_status` Check processing status for synthesis requests. ### `cancel_request` Cancel active synthesis operations. ### `list_output_files` Browse saved audio files with metadata. ## šŸŽ›ļø Voice Controls ### Emotions - **Neutral** - Standard, professional tone - **Happy** - Upbeat, cheerful expression - **Excited** - Enthusiastic, energetic delivery - **Calm** - Relaxed, soothing tone - **Serious** - Formal, authoritative delivery - **Casual** - Relaxed, conversational style - **Confident** - Assured, professional tone ### Pacing Styles - **Natural** - Balanced, human-like rhythm - **Conversational** - Casual discussion pace - **Presentation** - Professional speaking rhythm - **Tutorial** - Educational, clear delivery - **Narrative** - Storytelling pace - **Fast** - Quick delivery (1.2x base speed) - **Slow** - Deliberate delivery (0.8x base speed) ## šŸŽµ Audio Formats | Format | Quality | Use Case | |--------|---------|----------| | **WAV** | Uncompressed | Highest quality, editing | | **MP3** | Compressed | Web, streaming, sharing | | **FLAC** | Lossless | Archival, high-quality storage | | **OGG** | Compressed | Open source alternative | ## šŸ”§ Configuration ### Environment Variables ```bash # Model paths (optional) KOKORO_MODEL_PATH=./kokoro-v1.0.onnx KOKORO_VOICES_PATH=./voices-v1.0.bin # Output settings TTS_OUTPUT_DIR=./audio_output TTS_MAX_QUEUE_SIZE=100 # Audio settings TTS_DEFAULT_VOICE=af_heart TTS_ENABLE_STREAMING=true ``` ### Server Configuration ```python config = ServerConfig( model_path="./kokoro-v1.0.onnx", voices_path="./voices-v1.0.bin", output_dir="./audio_output", max_queue_size=100, enable_streaming=True, default_voice="af_heart" ) ``` ## šŸ—ļø Architecture ``` ā”œā”€ā”€ src/advanced_tts/ │ ā”œā”€ā”€ __init__.py # Package initialization │ ā”œā”€ā”€ server.py # MCP server implementation │ ā”œā”€ā”€ engine.py # Kokoro TTS engine wrapper │ ā”œā”€ā”€ models.py # Data models and validation │ └── utils.py # Utility functions ā”œā”€ā”€ pyproject.toml # Project configuration ā”œā”€ā”€ README.md # Documentation └── LICENSE # MIT License ``` ## šŸ¤ Contributing Contributions welcome! Areas for improvement: - Additional voice models - Real-time streaming synthesis - Advanced audio effects - Multi-language support - Performance optimizations ## šŸ“„ License MIT License - see [LICENSE](LICENSE) for details. ## šŸ™ Acknowledgments - **Kokoro TTS** - High-quality neural voice synthesis - **MCP Protocol** - Seamless AI model integration - **FastMCP** - Efficient server framework --- **Developed by [Sami Halawa](https://github.com/samihalawa)** *Transform your text into natural, expressive speech with Advanced TTS MCP Server.*

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/samihalawa/advanced-tts-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server