Skip to main content
Glama

Text to Speech MCP Server

by JosefGold

🎤 Text to Speech MCP Server

Where your agent finally learns to speak up for itself

Welcome to the Text to Speech (TTS) MCP Server – a sophisticated yet charmingly chaotic text-to-speech MCP server that transforms your boring written words into magnificent audible experiences.
Because who needs human vocal cords when you have Python and some very fancy AI models?

🚀 What Does This Do?

This delightful contraption takes your text and makes it speak through your computer's speakers using OpenAI's cutting-edge TTS models. It's like having a personal narrator, except they never get tired, never ask for coffee breaks, and never judge your terrible programming jokes.

Features That Actually Matter

  • Speak MCP Tool: Gives your agent the ability to voice any given text in one of several available voices

  • Instructions for Delivery: Provide optional instructions to guide delivery, character, pacing, tone, and emotion

  • Model Selection: OpenAI TTS model can be configured via environment variables (default: gpt-4o-mini-tts)

  • Blocking/Non-Blocking Mode: Speak commands can either return immediately for continued agent operation while sound is playing (default) or return only after the sound finishes for a more controlled workflow

  • Queue-Based Audio Playback: Agents can queue up messages to wait patiently in line and be played in sequence

🛠️ Installation & Setup

Prerequisites

  • Python 3.10+

  • An OpenAI API key (the magic ingredient)

  • PortAudio (required for PyAudio to work properly)

  • A sense of humor (optional but recommended)

Quick Start

  1. Install PortAudio:

    # macOS brew install portaudio
    # Linux (Debian/Ubuntu) sudo apt-get install portaudio19-dev
    # Windows pip install pipwin && pipwin install pyaudio
  2. Clone this repository:

    git clone <your-repo-url> cd tts-mcp
  3. Create a virtual environment (because global installs are for rebels):

    python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate
  4. Install dependencies:

    pip install -r requirements.txt
  5. Set up your environment variables:

    cp env.template .env # Edit .env and add your OpenAI API key

    Or set directly:

    export OPENAI_API_KEY="your-secret-key-here"
  6. Configure MCP in your Cursor settings with the provided mcp-config.json. Example:

    { "mcpServers": { "tts-server": { "command": "/absolute/path/to/tts-mcp/.venv/bin/python", "args": ["/absolute/path/to/tts-mcp/tts_mcp_server.py"], "cwd": "/absolute/path/to/tts-mcp", "env": { "PYTHONPATH": "/absolute/path/to/tts-mcp" } } } }

    Replace paths with your local repo and venv.

  7. Start making your computer talk!

🎭 Voice Options

Choose your narrator wisely:

  • alloy: Neutral, balanced tone (default)

  • ash: warm, expressive; friendly support vibes

  • ballad: smooth narrator; long-form storytelling

  • coral: bright, upbeat; cheerful promos

  • echo: Clear and professional, like a news anchor

  • fable: Warm and storytelling, perfect for bedtime code reviews

  • onyx: Deep and authoritative, for when your code needs to sound important

  • nova: Bright and energetic, like your enthusiasm before debugging

  • sage: calm, measured; helpful explainer

  • shimmer: Soft and gentle, for when you need to break bad news about production bugs

  • verse: dramatic, theatrical; trailer read

🎪 Usage Examples

Basic Usage

# Non-blocking (default) - returns immediately speak("Hello, world! I'm now audible!") # Blocking - waits for completion speak("This message will finish before I return", blocking=True) # With specific voice speak("I'm feeling dramatic today!", voice="fable") # With delivery instructions speak( "You're doing great—let's take this one step at a time.", voice="shimmer", instructions="Speak in a warm, reassuring and unhurried tone and pace" )

In Cursor with MCP

Just tell Cursor to use the speak tool in your conversations.
You can suggest a voice and style instructions for maintaining a consistent character.

⚙️ Configuration

Environment variables:

  • OPENAI_API_KEY (required): Your OpenAI API key

  • TTS_MODEL (optional): Defaults to gpt-4o-mini-tts. Other options include tts-1, tts-1-hd (though "instructions" are not supported on those, as well as some of the voices)

  • LOG_LEVEL (optional): DEBUG, INFO (default), WARNING, ERROR

🧰 Troubleshooting

  • No audio / no default output device:

    • Set a system default output device and restart the MCP server.

    • macOS: System Settings → Sound → Output.

  • PyAudio install issues:

    • macOS: brew install portaudio then pip install -r requirements.txt

    • Linux (Debian/Ubuntu): sudo apt-get install portaudio19-dev then pip install pyaudio

    • Windows: pip install pipwin && pipwin install pyaudio

  • Missing API key:

    • Ensure .env contains OPENAI_API_KEY=... or export it in your shell.

  • High latency or choppy audio:

    • Close other audio apps; verify system output device; keep blocking=False if you need responsiveness.

  • Logs:

    • Logs stream to stderr and to tts_mcp_server.log. Tail with:

      tail -f tts_mcp_server.log

🙏 Acknowledgments

  • Cursor for writing 95% of the code here

  • Coffee, for making everything else possible


Remember: With great text-to-speech power comes great responsibility. Use your new vocal abilities wisely, and try not to annoy your coworkers too much.

Pro tip: If your computer starts talking back to you without being prompted, it might be time to take a break. Or update your Python version. Probably the latter.

This project is licensed under the BSD 3-Clause License. See the

-
security - not tested
A
license - permissive license
-
quality - not tested

Related MCP Servers

  • -
    security
    F
    license
    -
    quality
    Provides text-to-speech capabilities through the Model Context Protocol, allowing applications to easily integrate speech synthesis with customizable voices, adjustable speech speed, and cross-platform audio playback support.
    Last updated -
    10
  • A
    security
    A
    license
    A
    quality
    A MCP server that enables transcription of audio files using OpenAI's Speech-to-Text API, with support for multiple languages and file saving options.
    Last updated -
    1
    0
    9
    MIT License
    • Linux
    • Apple
  • A
    security
    -
    license
    A
    quality
    An MCP server that enables LLMs to generate spoken audio from text using OpenAI's Text-to-Speech API, supporting various voices, models, and audio formats.
    Last updated -
    1
    7
    1
    MIT License
  • -
    security
    F
    license
    -
    quality
    A Model Context Protocol server that provides text-to-speech functionality for AI agents using Microsoft Edge's text-to-speech technology, supporting multiple voices, languages, and voice customization.
    Last updated -
    5

View all related MCP servers

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/JosefGold/tts-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server