Skip to main content
Glama

Text to Speech MCP Server

by JosefGold
README.md6.06 kB
# 🎤 Text to Speech MCP Server *Where your agent finally learns to speak up for itself* Welcome to the Text to Speech (TTS) MCP Server – a sophisticated yet charmingly chaotic text-to-speech MCP server that transforms your boring written words into magnificent audible experiences. Because who needs human vocal cords when you have Python and some very fancy AI models? ## 🚀 What Does This Do? This delightful contraption takes your text and makes it speak through your computer's speakers using OpenAI's cutting-edge TTS models. It's like having a personal narrator, except they never get tired, never ask for coffee breaks, and never judge your terrible programming jokes. ### Features That Actually Matter - **Speak MCP Tool**: Gives your agent the ability to voice any given text in one of several available voices - **Instructions for Delivery**: Provide optional `instructions` to guide delivery, character, pacing, tone, and emotion - **Model Selection**: OpenAI TTS model can be configured via environment variables (default: `gpt-4o-mini-tts`) - **Blocking/Non-Blocking Mode**: Speak commands can either return immediately for continued agent operation while sound is playing (default) or return only after the sound finishes for a more controlled workflow - **Queue-Based Audio Playback**: Agents can queue up messages to wait patiently in line and be played in sequence ## 🛠️ Installation & Setup ### Prerequisites - Python 3.10+ - An OpenAI API key (the magic ingredient) - PortAudio (required for PyAudio to work properly) - A sense of humor (optional but recommended) ### Quick Start 1. Install PortAudio: ```bash # macOS brew install portaudio ``` ```bash # Linux (Debian/Ubuntu) sudo apt-get install portaudio19-dev ``` ```bash # Windows pip install pipwin && pipwin install pyaudio ``` 2. Clone this repository: ```bash git clone <your-repo-url> cd tts-mcp ``` 3. Create a virtual environment (because global installs are for rebels): ```bash python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate ``` 4. Install dependencies: ```bash pip install -r requirements.txt ``` 5. Set up your environment variables: ```bash cp env.template .env # Edit .env and add your OpenAI API key ``` Or set directly: ```bash export OPENAI_API_KEY="your-secret-key-here" ``` 6. Configure MCP in your Cursor settings with the provided `mcp-config.json`. Example: ```json { "mcpServers": { "tts-server": { "command": "/absolute/path/to/tts-mcp/.venv/bin/python", "args": ["/absolute/path/to/tts-mcp/tts_mcp_server.py"], "cwd": "/absolute/path/to/tts-mcp", "env": { "PYTHONPATH": "/absolute/path/to/tts-mcp" } } } } ``` > _Replace paths with your local repo and venv._ 7. Start making your computer talk! ## 🎭 Voice Options Choose your narrator wisely: - **alloy**: Neutral, balanced tone (default) - **ash**: warm, expressive; friendly support vibes - **ballad**: smooth narrator; long-form storytelling - **coral**: bright, upbeat; cheerful promos - **echo**: Clear and professional, like a news anchor - **fable**: Warm and storytelling, perfect for bedtime code reviews - **onyx**: Deep and authoritative, for when your code needs to sound important - **nova**: Bright and energetic, like your enthusiasm before debugging - **sage**: calm, measured; helpful explainer - **shimmer**: Soft and gentle, for when you need to break bad news about production bugs - **verse**: dramatic, theatrical; trailer read ## 🎪 Usage Examples ### Basic Usage ```python # Non-blocking (default) - returns immediately speak("Hello, world! I'm now audible!") # Blocking - waits for completion speak("This message will finish before I return", blocking=True) # With specific voice speak("I'm feeling dramatic today!", voice="fable") # With delivery instructions speak( "You're doing great—let's take this one step at a time.", voice="shimmer", instructions="Speak in a warm, reassuring and unhurried tone and pace" ) ``` ### In Cursor with MCP Just tell Cursor to use the `speak` tool in your conversations. You can suggest a voice and style instructions for maintaining a consistent character. ## ⚙️ Configuration Environment variables: - `OPENAI_API_KEY` (required): Your OpenAI API key - `TTS_MODEL` (optional): Defaults to `gpt-4o-mini-tts`. Other options include `tts-1`, `tts-1-hd` (though "instructions" are not supported on those, as well as some of the voices) - `LOG_LEVEL` (optional): `DEBUG`, `INFO` (default), `WARNING`, `ERROR` ## 🧰 Troubleshooting - No audio / no default output device: - Set a system default output device and restart the MCP server. - macOS: System Settings → Sound → Output. - PyAudio install issues: - macOS: `brew install portaudio` then `pip install -r requirements.txt` - Linux (Debian/Ubuntu): `sudo apt-get install portaudio19-dev` then `pip install pyaudio` - Windows: `pip install pipwin && pipwin install pyaudio` - Missing API key: - Ensure `.env` contains `OPENAI_API_KEY=...` or export it in your shell. - High latency or choppy audio: - Close other audio apps; verify system output device; keep `blocking=False` if you need responsiveness. - Logs: - Logs stream to stderr and to `tts_mcp_server.log`. Tail with: ```bash tail -f tts_mcp_server.log ``` ## 🙏 Acknowledgments - Cursor for writing 95% of the code here - Coffee, for making everything else possible --- *Remember: With great text-to-speech power comes great responsibility. Use your new vocal abilities wisely, and try not to annoy your coworkers too much.* **Pro tip**: If your computer starts talking back to you without being prompted, it might be time to take a break. Or update your Python version. Probably the latter. _This project is licensed under the BSD 3-Clause License. See the [LICENSE](./LICENSE) file for details._

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/JosefGold/tts-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server