Skip to main content
Glama

Advanced TTS MCP Server

Advanced TTS MCP Server

A high-quality, feature-rich Text-to-Speech MCP server with native TypeScript implementation. Designed for professional applications requiring natural, expressive speech synthesis with advanced controls and zero external dependencies.

✨ Features

🎯 Advanced Voice Control

  • 10 High-Quality Voices - Male and female voices with distinct personalities

  • Emotion Control - Neutral, happy, excited, calm, serious, casual, confident

  • Dynamic Pacing - Natural, conversational, presentation, tutorial, narrative modes

  • Speed & Volume - Precise control from 0.25x to 3.0x speed, 0.1x to 2.0x volume

🚀 Professional Capabilities

  • Streaming Audio - Real-time synthesis and playback

  • Batch Processing - Handle multiple text segments efficiently

  • Multiple Formats - WAV, MP3, FLAC, OGG output support

  • Natural Speech Enhancement - Automatic pause insertion and emotion markers

  • Queue Management - Handle multiple concurrent requests

🔧 MCP Integration

  • 6 Powerful Tools - Complete synthesis, batch processing, voice management

  • 2 Rich Resources - Voice capabilities and usage examples

  • Real-time Status - Track processing progress and manage requests

  • File Management - Save, list, and organize audio outputs

🚀 Quick Start

Option 1: Deploy to Smithery.ai (Recommended)

🎯 One-Click Deployment to Smithery Platform

  1. Deploy Now: Visit Smithery.ai and import this repository

  2. Configure: Set your preferred voice and speech settings

  3. Use Instantly: Access via Claude Desktop or any MCP-compatible client

Benefits:

  • ✅ Zero setup required

  • ✅ Automatic scaling and updates

  • ✅ No model downloads needed

  • ✅ Enterprise-grade hosting

📋 Full Smithery Deployment Guide →

Option 2: Local Installation

Prerequisites:

  • Node.js 18+

Installation:

  1. Clone the repository

git clone https://github.com/samihalawa/advanced-tts-mcp.git cd advanced-tts-mcp
  1. Install dependencies

npm install
  1. Configure Claude Desktop

Add to your claude_desktop_config.json:

{ "mcpServers": { "advanced-tts": { "command": "node", "args": ["dist/index.js"], "cwd": "/path/to/advanced-tts-mcp" } } }
  1. Start using!

# Build TypeScript npm run build # Start server npm start

Restart Claude Desktop and start synthesizing with natural, expressive voices.

🎙️ Available Voices

Voice ID

Name

Gender

Description

af_heart

Heart

Female

Warm, friendly voice (default)

af_sky

Sky

Female

Clear, bright voice

af_bella

Bella

Female

Elegant, sophisticated voice

af_sarah

Sarah

Female

Professional, confident voice

af_nicole

Nicole

Female

Gentle, soothing voice

am_adam

Adam

Male

Strong, authoritative voice

am_michael

Michael

Male

Friendly, approachable voice

bf_emma

Emma

Female

Young, energetic voice

bf_isabella

Isabella

Female

Mature, expressive voice

bm_lewis

Lewis

Male

Deep, resonant voice

📚 Usage Examples

Basic Synthesis

# Simple text-to-speech await synthesize_speech( text="Hello! Welcome to Advanced TTS.", voice_id="af_heart" )

Emotional Expression

# Excited announcement await synthesize_speech( text="This is amazing news! You're going to love this new feature!", voice_id="af_heart", emotion="excited", pacing="conversational", speed=1.1 )

Professional Presentation

# Tutorial narration await synthesize_speech( text="Step one: Open your browser. Step two: Navigate to the website.", voice_id="am_adam", emotion="calm", pacing="tutorial", speed=0.9 )

Batch Processing

# Multiple segments with pauses await batch_synthesize( segments=[ "Welcome to our presentation.", "Today we'll cover three main topics.", "Let's begin with the first topic." ], voice_id="af_sarah", emotion="confident", pacing="presentation", merge_output=True, segment_pause=1.0, save_file=True )

🛠️ Available Tools

synthesize_speech

Convert text to natural speech with full control over voice characteristics.

Parameters:

  • text - Text to synthesize (max 10,000 chars)

  • voice_id - Voice selection (see table above)

  • speed - Speech rate (0.25-3.0)

  • emotion - Voice emotion (neutral, happy, excited, calm, serious, casual, confident)

  • pacing - Speech style (natural, conversational, presentation, tutorial, narrative, fast, slow)

  • volume - Audio volume (0.1-2.0)

  • output_format - File format (wav, mp3, flac, ogg)

  • save_file - Save to file (boolean)

  • filename - Custom filename

batch_synthesize

Process multiple text segments efficiently with optional merging.

Parameters:

  • segments - List of text segments

  • merge_output - Combine into single file

  • segment_pause - Pause between segments (0.0-5.0s)

  • All synthesis parameters from above

get_voices

Retrieve complete voice information and capabilities.

get_status

Check processing status for synthesis requests.

cancel_request

Cancel active synthesis operations.

list_output_files

Browse saved audio files with metadata.

🎛️ Voice Controls

Emotions

  • Neutral - Standard, professional tone

  • Happy - Upbeat, cheerful expression

  • Excited - Enthusiastic, energetic delivery

  • Calm - Relaxed, soothing tone

  • Serious - Formal, authoritative delivery

  • Casual - Relaxed, conversational style

  • Confident - Assured, professional tone

Pacing Styles

  • Natural - Balanced, human-like rhythm

  • Conversational - Casual discussion pace

  • Presentation - Professional speaking rhythm

  • Tutorial - Educational, clear delivery

  • Narrative - Storytelling pace

  • Fast - Quick delivery (1.2x base speed)

  • Slow - Deliberate delivery (0.8x base speed)

🎵 Audio Formats

Format

Quality

Use Case

WAV

Uncompressed

Highest quality, editing

MP3

Compressed

Web, streaming, sharing

FLAC

Lossless

Archival, high-quality storage

OGG

Compressed

Open source alternative

🔧 Configuration

Environment Variables

# Model paths (optional) KOKORO_MODEL_PATH=./kokoro-v1.0.onnx KOKORO_VOICES_PATH=./voices-v1.0.bin # Output settings TTS_OUTPUT_DIR=./audio_output TTS_MAX_QUEUE_SIZE=100 # Audio settings TTS_DEFAULT_VOICE=af_heart TTS_ENABLE_STREAMING=true

Server Configuration

config = ServerConfig( model_path="./kokoro-v1.0.onnx", voices_path="./voices-v1.0.bin", output_dir="./audio_output", max_queue_size=100, enable_streaming=True, default_voice="af_heart" )

🏗️ Architecture

├── src/advanced_tts/ │ ├── __init__.py # Package initialization │ ├── server.py # MCP server implementation │ ├── engine.py # Kokoro TTS engine wrapper │ ├── models.py # Data models and validation │ └── utils.py # Utility functions ├── pyproject.toml # Project configuration ├── README.md # Documentation └── LICENSE # MIT License

🤝 Contributing

Contributions welcome! Areas for improvement:

  • Additional voice models

  • Real-time streaming synthesis

  • Advanced audio effects

  • Multi-language support

  • Performance optimizations

📄 License

MIT License - see LICENSE for details.

🙏 Acknowledgments

  • Kokoro TTS - High-quality neural voice synthesis

  • MCP Protocol - Seamless AI model integration

  • FastMCP - Efficient server framework


Developed by

Transform your text into natural, expressive speech with Advanced TTS MCP Server.

Related MCP Servers

  • A
    security
    A
    license
    A
    quality
    Helps refine AI-generated content to sound more natural and human-like. Built with advanced AI detection and text enhancement capabilities.
    Last updated -
    1
    20
    35
    MIT License
  • A
    security
    A
    license
    A
    quality
    Enables text-to-speech functionality on macOS using the say command, offering extensive control over speech parameters like voice, rate, volume, and pitch for a customizable auditory experience.
    Last updated -
    2
    0
    17
    MIT License
    • Apple
  • -
    security
    F
    license
    -
    quality
    A server providing text-to-speech and speech-to-text functionalities using Windows' native speech services without external dependencies.
    Last updated -
    5
  • -
    security
    F
    license
    -
    quality
    Provides text-to-speech capabilities through the Model Context Protocol, allowing applications to easily integrate speech synthesis with customizable voices, adjustable speech speed, and cross-platform audio playback support.
    Last updated -
    8

View all related MCP servers

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/samihalawa/advanced-tts-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server