Skip to main content
Glama

Advanced TTS MCP Server

A high-quality, feature-rich Text-to-Speech MCP server with native TypeScript implementation. Designed for professional applications requiring natural, expressive speech synthesis with advanced controls and zero external dependencies.

✨ Features

🎯 Advanced Voice Control

  • 10 High-Quality Voices - Male and female voices with distinct personalities

  • Emotion Control - Neutral, happy, excited, calm, serious, casual, confident

  • Dynamic Pacing - Natural, conversational, presentation, tutorial, narrative modes

  • Speed & Volume - Precise control from 0.25x to 3.0x speed, 0.1x to 2.0x volume

πŸš€ Professional Capabilities

  • Streaming Audio - Real-time synthesis and playback

  • Batch Processing - Handle multiple text segments efficiently

  • Multiple Formats - WAV, MP3, FLAC, OGG output support

  • Natural Speech Enhancement - Automatic pause insertion and emotion markers

  • Queue Management - Handle multiple concurrent requests

πŸ”§ MCP Integration

  • 6 Powerful Tools - Complete synthesis, batch processing, voice management

  • 2 Rich Resources - Voice capabilities and usage examples

  • Real-time Status - Track processing progress and manage requests

  • File Management - Save, list, and organize audio outputs

Related MCP server: Say MCP Server

πŸš€ Quick Start

🎯 One-Click Deployment to Smithery Platform

  1. Deploy Now: Visit Smithery.ai and import this repository

  2. Configure: Set your preferred voice and speech settings

  3. Use Instantly: Access via Claude Desktop or any MCP-compatible client

Benefits:

  • βœ… Zero setup required

  • βœ… Automatic scaling and updates

  • βœ… No model downloads needed

  • βœ… Enterprise-grade hosting

πŸ“‹ Full Smithery Deployment Guide β†’

Option 2: Local Installation

Prerequisites:

  • Node.js 18+

Installation:

  1. Clone the repository

git clone https://github.com/samihalawa/advanced-tts-mcp.git
cd advanced-tts-mcp
  1. Install dependencies

npm install
  1. Configure Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "advanced-tts": {
      "command": "node",
      "args": ["dist/index.js"],
      "cwd": "/path/to/advanced-tts-mcp"
    }
  }
}
  1. Start using!

# Build TypeScript
npm run build

# Start server
npm start

Restart Claude Desktop and start synthesizing with natural, expressive voices.

πŸŽ™οΈ Available Voices

Voice ID

Name

Gender

Description

af_heart

Heart

Female

Warm, friendly voice (default)

af_sky

Sky

Female

Clear, bright voice

af_bella

Bella

Female

Elegant, sophisticated voice

af_sarah

Sarah

Female

Professional, confident voice

af_nicole

Nicole

Female

Gentle, soothing voice

am_adam

Adam

Male

Strong, authoritative voice

am_michael

Michael

Male

Friendly, approachable voice

bf_emma

Emma

Female

Young, energetic voice

bf_isabella

Isabella

Female

Mature, expressive voice

bm_lewis

Lewis

Male

Deep, resonant voice

πŸ“š Usage Examples

Basic Synthesis

# Simple text-to-speech
await synthesize_speech(
    text="Hello! Welcome to Advanced TTS.",
    voice_id="af_heart"
)

Emotional Expression

# Excited announcement
await synthesize_speech(
    text="This is amazing news! You're going to love this new feature!",
    voice_id="af_heart",
    emotion="excited",
    pacing="conversational",
    speed=1.1
)

Professional Presentation

# Tutorial narration
await synthesize_speech(
    text="Step one: Open your browser. Step two: Navigate to the website.",
    voice_id="am_adam", 
    emotion="calm",
    pacing="tutorial",
    speed=0.9
)

Batch Processing

# Multiple segments with pauses
await batch_synthesize(
    segments=[
        "Welcome to our presentation.",
        "Today we'll cover three main topics.", 
        "Let's begin with the first topic."
    ],
    voice_id="af_sarah",
    emotion="confident",
    pacing="presentation",
    merge_output=True,
    segment_pause=1.0,
    save_file=True
)

πŸ› οΈ Available Tools

synthesize_speech

Convert text to natural speech with full control over voice characteristics.

Parameters:

  • text - Text to synthesize (max 10,000 chars)

  • voice_id - Voice selection (see table above)

  • speed - Speech rate (0.25-3.0)

  • emotion - Voice emotion (neutral, happy, excited, calm, serious, casual, confident)

  • pacing - Speech style (natural, conversational, presentation, tutorial, narrative, fast, slow)

  • volume - Audio volume (0.1-2.0)

  • output_format - File format (wav, mp3, flac, ogg)

  • save_file - Save to file (boolean)

  • filename - Custom filename

batch_synthesize

Process multiple text segments efficiently with optional merging.

Parameters:

  • segments - List of text segments

  • merge_output - Combine into single file

  • segment_pause - Pause between segments (0.0-5.0s)

  • All synthesis parameters from above

get_voices

Retrieve complete voice information and capabilities.

get_status

Check processing status for synthesis requests.

cancel_request

Cancel active synthesis operations.

list_output_files

Browse saved audio files with metadata.

πŸŽ›οΈ Voice Controls

Emotions

  • Neutral - Standard, professional tone

  • Happy - Upbeat, cheerful expression

  • Excited - Enthusiastic, energetic delivery

  • Calm - Relaxed, soothing tone

  • Serious - Formal, authoritative delivery

  • Casual - Relaxed, conversational style

  • Confident - Assured, professional tone

Pacing Styles

  • Natural - Balanced, human-like rhythm

  • Conversational - Casual discussion pace

  • Presentation - Professional speaking rhythm

  • Tutorial - Educational, clear delivery

  • Narrative - Storytelling pace

  • Fast - Quick delivery (1.2x base speed)

  • Slow - Deliberate delivery (0.8x base speed)

🎡 Audio Formats

Format

Quality

Use Case

WAV

Uncompressed

Highest quality, editing

MP3

Compressed

Web, streaming, sharing

FLAC

Lossless

Archival, high-quality storage

OGG

Compressed

Open source alternative

πŸ”§ Configuration

Environment Variables

# Model paths (optional)
KOKORO_MODEL_PATH=./kokoro-v1.0.onnx
KOKORO_VOICES_PATH=./voices-v1.0.bin

# Output settings
TTS_OUTPUT_DIR=./audio_output
TTS_MAX_QUEUE_SIZE=100

# Audio settings  
TTS_DEFAULT_VOICE=af_heart
TTS_ENABLE_STREAMING=true

Server Configuration

config = ServerConfig(
    model_path="./kokoro-v1.0.onnx",
    voices_path="./voices-v1.0.bin", 
    output_dir="./audio_output",
    max_queue_size=100,
    enable_streaming=True,
    default_voice="af_heart"
)

πŸ—οΈ Architecture

β”œβ”€β”€ src/advanced_tts/
β”‚   β”œβ”€β”€ __init__.py          # Package initialization
β”‚   β”œβ”€β”€ server.py            # MCP server implementation  
β”‚   β”œβ”€β”€ engine.py            # Kokoro TTS engine wrapper
β”‚   β”œβ”€β”€ models.py            # Data models and validation
β”‚   └── utils.py             # Utility functions
β”œβ”€β”€ pyproject.toml           # Project configuration
β”œβ”€β”€ README.md               # Documentation
└── LICENSE                 # MIT License

🀝 Contributing

Contributions welcome! Areas for improvement:

  • Additional voice models

  • Real-time streaming synthesis

  • Advanced audio effects

  • Multi-language support

  • Performance optimizations

πŸ“„ License

MIT License - see LICENSE for details.

πŸ™ Acknowledgments

  • Kokoro TTS - High-quality neural voice synthesis

  • MCP Protocol - Seamless AI model integration

  • FastMCP - Efficient server framework


Developed by

Transform your text into natural, expressive speech with Advanced TTS MCP Server.

Install Server
A
security – no known vulnerabilities
A
license - permissive license
A
quality - confirmed to work

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/samihalawa/advanced-tts-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server