README.md6.06 kB
# 🎤 Text to Speech MCP Server
*Where your agent finally learns to speak up for itself*
Welcome to the Text to Speech (TTS) MCP Server – a sophisticated yet charmingly chaotic text-to-speech MCP server that transforms your boring written words into magnificent audible experiences.
Because who needs human vocal cords when you have Python and some very fancy AI models?
## 🚀 What Does This Do?
This delightful contraption takes your text and makes it speak through your computer's speakers using OpenAI's cutting-edge TTS models. It's like having a personal narrator, except they never get tired, never ask for coffee breaks, and never judge your terrible programming jokes.
### Features That Actually Matter
- **Speak MCP Tool**: Gives your agent the ability to voice any given text in one of several available voices
- **Instructions for Delivery**: Provide optional `instructions` to guide delivery, character, pacing, tone, and emotion
- **Model Selection**: OpenAI TTS model can be configured via environment variables (default: `gpt-4o-mini-tts`)
- **Blocking/Non-Blocking Mode**: Speak commands can either return immediately for continued agent operation while sound is playing (default) or return only after the sound finishes for a more controlled workflow
- **Queue-Based Audio Playback**: Agents can queue up messages to wait patiently in line and be played in sequence
## 🛠️ Installation & Setup
### Prerequisites
- Python 3.10+
- An OpenAI API key (the magic ingredient)
- PortAudio (required for PyAudio to work properly)
- A sense of humor (optional but recommended)
### Quick Start
1. Install PortAudio:
```bash
# macOS
brew install portaudio
```
```bash
# Linux (Debian/Ubuntu)
sudo apt-get install portaudio19-dev
```
```bash
# Windows
pip install pipwin && pipwin install pyaudio
```
2. Clone this repository:
```bash
git clone <your-repo-url>
cd tts-mcp
```
3. Create a virtual environment (because global installs are for rebels):
```bash
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
```
4. Install dependencies:
```bash
pip install -r requirements.txt
```
5. Set up your environment variables:
```bash
cp env.template .env
# Edit .env and add your OpenAI API key
```
Or set directly:
```bash
export OPENAI_API_KEY="your-secret-key-here"
```
6. Configure MCP in your Cursor settings with the provided `mcp-config.json`. Example:
```json
{
"mcpServers": {
"tts-server": {
"command": "/absolute/path/to/tts-mcp/.venv/bin/python",
"args": ["/absolute/path/to/tts-mcp/tts_mcp_server.py"],
"cwd": "/absolute/path/to/tts-mcp",
"env": { "PYTHONPATH": "/absolute/path/to/tts-mcp" }
}
}
}
```
> _Replace paths with your local repo and venv._
7. Start making your computer talk!
## 🎭 Voice Options
Choose your narrator wisely:
- **alloy**: Neutral, balanced tone (default)
- **ash**: warm, expressive; friendly support vibes
- **ballad**: smooth narrator; long-form storytelling
- **coral**: bright, upbeat; cheerful promos
- **echo**: Clear and professional, like a news anchor
- **fable**: Warm and storytelling, perfect for bedtime code reviews
- **onyx**: Deep and authoritative, for when your code needs to sound important
- **nova**: Bright and energetic, like your enthusiasm before debugging
- **sage**: calm, measured; helpful explainer
- **shimmer**: Soft and gentle, for when you need to break bad news about production bugs
- **verse**: dramatic, theatrical; trailer read
## 🎪 Usage Examples
### Basic Usage
```python
# Non-blocking (default) - returns immediately
speak("Hello, world! I'm now audible!")
# Blocking - waits for completion
speak("This message will finish before I return", blocking=True)
# With specific voice
speak("I'm feeling dramatic today!", voice="fable")
# With delivery instructions
speak(
"You're doing great—let's take this one step at a time.",
voice="shimmer",
instructions="Speak in a warm, reassuring and unhurried tone and pace"
)
```
### In Cursor with MCP
Just tell Cursor to use the `speak` tool in your conversations.
You can suggest a voice and style instructions for maintaining a consistent character.
## ⚙️ Configuration
Environment variables:
- `OPENAI_API_KEY` (required): Your OpenAI API key
- `TTS_MODEL` (optional): Defaults to `gpt-4o-mini-tts`. Other options include `tts-1`, `tts-1-hd` (though "instructions" are not supported on those, as well as some of the voices)
- `LOG_LEVEL` (optional): `DEBUG`, `INFO` (default), `WARNING`, `ERROR`
## 🧰 Troubleshooting
- No audio / no default output device:
- Set a system default output device and restart the MCP server.
- macOS: System Settings → Sound → Output.
- PyAudio install issues:
- macOS: `brew install portaudio` then `pip install -r requirements.txt`
- Linux (Debian/Ubuntu): `sudo apt-get install portaudio19-dev` then `pip install pyaudio`
- Windows: `pip install pipwin && pipwin install pyaudio`
- Missing API key:
- Ensure `.env` contains `OPENAI_API_KEY=...` or export it in your shell.
- High latency or choppy audio:
- Close other audio apps; verify system output device; keep `blocking=False` if you need responsiveness.
- Logs:
- Logs stream to stderr and to `tts_mcp_server.log`. Tail with:
```bash
tail -f tts_mcp_server.log
```
## 🙏 Acknowledgments
- Cursor for writing 95% of the code here
- Coffee, for making everything else possible
---
*Remember: With great text-to-speech power comes great responsibility. Use your new vocal abilities wisely, and try not to annoy your coworkers too much.*
**Pro tip**: If your computer starts talking back to you without being prompted, it might be time to take a break. Or update your Python version. Probably the latter.
_This project is licensed under the BSD 3-Clause License. See the [LICENSE](./LICENSE) file for details._