README.mdā¢8.2 kB
# Advanced TTS MCP Server
A high-quality, feature-rich Text-to-Speech MCP server with native TypeScript implementation. Designed for professional applications requiring natural, expressive speech synthesis with advanced controls and zero external dependencies.
## ⨠Features
### šÆ **Advanced Voice Control**
- **10 High-Quality Voices** - Male and female voices with distinct personalities
- **Emotion Control** - Neutral, happy, excited, calm, serious, casual, confident
- **Dynamic Pacing** - Natural, conversational, presentation, tutorial, narrative modes
- **Speed & Volume** - Precise control from 0.25x to 3.0x speed, 0.1x to 2.0x volume
### š **Professional Capabilities**
- **Streaming Audio** - Real-time synthesis and playback
- **Batch Processing** - Handle multiple text segments efficiently
- **Multiple Formats** - WAV, MP3, FLAC, OGG output support
- **Natural Speech Enhancement** - Automatic pause insertion and emotion markers
- **Queue Management** - Handle multiple concurrent requests
### š§ **MCP Integration**
- **6 Powerful Tools** - Complete synthesis, batch processing, voice management
- **2 Rich Resources** - Voice capabilities and usage examples
- **Real-time Status** - Track processing progress and manage requests
- **File Management** - Save, list, and organize audio outputs
## š Quick Start
### Option 1: Deploy to Smithery.ai (Recommended)
**šÆ One-Click Deployment to Smithery Platform**
1. **Deploy Now**: Visit [Smithery.ai](https://smithery.ai) and import this repository
2. **Configure**: Set your preferred voice and speech settings
3. **Use Instantly**: Access via Claude Desktop or any MCP-compatible client
**Benefits:**
- ā
Zero setup required
- ā
Automatic scaling and updates
- ā
No model downloads needed
- ā
Enterprise-grade hosting
**[š Full Smithery Deployment Guide ā](DEPLOYMENT.md)**
### Option 2: Local Installation
**Prerequisites:**
- Node.js 18+
**Installation:**
1. **Clone the repository**
```bash
git clone https://github.com/samihalawa/advanced-tts-mcp.git
cd advanced-tts-mcp
```
2. **Install dependencies**
```bash
npm install
```
3. **Configure Claude Desktop**
Add to your `claude_desktop_config.json`:
```json
{
"mcpServers": {
"advanced-tts": {
"command": "node",
"args": ["dist/index.js"],
"cwd": "/path/to/advanced-tts-mcp"
}
}
}
```
4. **Start using!**
```bash
# Build TypeScript
npm run build
# Start server
npm start
```
Restart Claude Desktop and start synthesizing with natural, expressive voices.
## šļø Available Voices
| Voice ID | Name | Gender | Description |
|----------|------|--------|-------------|
| `af_heart` | Heart | Female | Warm, friendly voice (default) |
| `af_sky` | Sky | Female | Clear, bright voice |
| `af_bella` | Bella | Female | Elegant, sophisticated voice |
| `af_sarah` | Sarah | Female | Professional, confident voice |
| `af_nicole` | Nicole | Female | Gentle, soothing voice |
| `am_adam` | Adam | Male | Strong, authoritative voice |
| `am_michael` | Michael | Male | Friendly, approachable voice |
| `bf_emma` | Emma | Female | Young, energetic voice |
| `bf_isabella` | Isabella | Female | Mature, expressive voice |
| `bm_lewis` | Lewis | Male | Deep, resonant voice |
## š Usage Examples
### Basic Synthesis
```python
# Simple text-to-speech
await synthesize_speech(
text="Hello! Welcome to Advanced TTS.",
voice_id="af_heart"
)
```
### Emotional Expression
```python
# Excited announcement
await synthesize_speech(
text="This is amazing news! You're going to love this new feature!",
voice_id="af_heart",
emotion="excited",
pacing="conversational",
speed=1.1
)
```
### Professional Presentation
```python
# Tutorial narration
await synthesize_speech(
text="Step one: Open your browser. Step two: Navigate to the website.",
voice_id="am_adam",
emotion="calm",
pacing="tutorial",
speed=0.9
)
```
### Batch Processing
```python
# Multiple segments with pauses
await batch_synthesize(
segments=[
"Welcome to our presentation.",
"Today we'll cover three main topics.",
"Let's begin with the first topic."
],
voice_id="af_sarah",
emotion="confident",
pacing="presentation",
merge_output=True,
segment_pause=1.0,
save_file=True
)
```
## š ļø Available Tools
### `synthesize_speech`
Convert text to natural speech with full control over voice characteristics.
**Parameters:**
- `text` - Text to synthesize (max 10,000 chars)
- `voice_id` - Voice selection (see table above)
- `speed` - Speech rate (0.25-3.0)
- `emotion` - Voice emotion (neutral, happy, excited, calm, serious, casual, confident)
- `pacing` - Speech style (natural, conversational, presentation, tutorial, narrative, fast, slow)
- `volume` - Audio volume (0.1-2.0)
- `output_format` - File format (wav, mp3, flac, ogg)
- `save_file` - Save to file (boolean)
- `filename` - Custom filename
### `batch_synthesize`
Process multiple text segments efficiently with optional merging.
**Parameters:**
- `segments` - List of text segments
- `merge_output` - Combine into single file
- `segment_pause` - Pause between segments (0.0-5.0s)
- All synthesis parameters from above
### `get_voices`
Retrieve complete voice information and capabilities.
### `get_status`
Check processing status for synthesis requests.
### `cancel_request`
Cancel active synthesis operations.
### `list_output_files`
Browse saved audio files with metadata.
## šļø Voice Controls
### Emotions
- **Neutral** - Standard, professional tone
- **Happy** - Upbeat, cheerful expression
- **Excited** - Enthusiastic, energetic delivery
- **Calm** - Relaxed, soothing tone
- **Serious** - Formal, authoritative delivery
- **Casual** - Relaxed, conversational style
- **Confident** - Assured, professional tone
### Pacing Styles
- **Natural** - Balanced, human-like rhythm
- **Conversational** - Casual discussion pace
- **Presentation** - Professional speaking rhythm
- **Tutorial** - Educational, clear delivery
- **Narrative** - Storytelling pace
- **Fast** - Quick delivery (1.2x base speed)
- **Slow** - Deliberate delivery (0.8x base speed)
## šµ Audio Formats
| Format | Quality | Use Case |
|--------|---------|----------|
| **WAV** | Uncompressed | Highest quality, editing |
| **MP3** | Compressed | Web, streaming, sharing |
| **FLAC** | Lossless | Archival, high-quality storage |
| **OGG** | Compressed | Open source alternative |
## š§ Configuration
### Environment Variables
```bash
# Model paths (optional)
KOKORO_MODEL_PATH=./kokoro-v1.0.onnx
KOKORO_VOICES_PATH=./voices-v1.0.bin
# Output settings
TTS_OUTPUT_DIR=./audio_output
TTS_MAX_QUEUE_SIZE=100
# Audio settings
TTS_DEFAULT_VOICE=af_heart
TTS_ENABLE_STREAMING=true
```
### Server Configuration
```python
config = ServerConfig(
model_path="./kokoro-v1.0.onnx",
voices_path="./voices-v1.0.bin",
output_dir="./audio_output",
max_queue_size=100,
enable_streaming=True,
default_voice="af_heart"
)
```
## šļø Architecture
```
āāā src/advanced_tts/
ā āāā __init__.py # Package initialization
ā āāā server.py # MCP server implementation
ā āāā engine.py # Kokoro TTS engine wrapper
ā āāā models.py # Data models and validation
ā āāā utils.py # Utility functions
āāā pyproject.toml # Project configuration
āāā README.md # Documentation
āāā LICENSE # MIT License
```
## š¤ Contributing
Contributions welcome! Areas for improvement:
- Additional voice models
- Real-time streaming synthesis
- Advanced audio effects
- Multi-language support
- Performance optimizations
## š License
MIT License - see [LICENSE](LICENSE) for details.
## š Acknowledgments
- **Kokoro TTS** - High-quality neural voice synthesis
- **MCP Protocol** - Seamless AI model integration
- **FastMCP** - Efficient server framework
---
**Developed by [Sami Halawa](https://github.com/samihalawa)**
*Transform your text into natural, expressive speech with Advanced TTS MCP Server.*