Skip to main content
Glama

MCP FishAudio Server

by da-okazaki
README.md12.1 kB
# Fish Audio MCP Server <div align="center"> <img src="./dcos/icon_fish-audio.webp" alt="Fish Audio Logo" width="300" height="300" /> </div> [![npm version](https://badge.fury.io/js/@alanse%2Ffish-audio-mcp-server.svg)](https://badge.fury.io/js/@alanse%2Ffish-audio-mcp-server) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) An MCP (Model Context Protocol) server that provides seamless integration between Fish Audio's Text-to-Speech API and LLMs like Claude, enabling natural language-driven speech synthesis. ## What is Fish Audio? [Fish Audio](https://fish.audio/) is a cutting-edge Text-to-Speech platform that offers: - 🌊 **State-of-the-art voice synthesis** with natural-sounding output - 🎯 **Voice cloning capabilities** to create custom voice models - 🌍 **Multilingual support** including English, Japanese, Chinese, and more - ⚡ **Low-latency streaming** for real-time applications - 🎨 **Fine-grained control** over speech prosody and emotions This MCP server brings Fish Audio's powerful capabilities directly to your LLM workflows. ## Features - 🎙️ **High-Quality TTS**: Leverage Fish Audio's state-of-the-art TTS models - 🌊 **Streaming Support**: Real-time audio streaming for low-latency applications - 🎨 **Multiple Voices**: Support for custom voice models via reference IDs - 🎯 **Smart Voice Selection**: Select voices by ID, name, or tags - 📚 **Voice Library Management**: Configure and manage multiple voice references - 🔧 **Flexible Configuration**: Environment variable-based configuration - 📦 **Multiple Audio Formats**: Support for MP3, WAV, PCM, and Opus - 🚀 **Easy Integration**: Simple setup with any MCP-compatible client ## Quick Start ### Installation You can run this MCP server directly using npx: ```bash npx @alanse/fish-audio-mcp-server ``` Or install it globally: ```bash npm install -g @alanse/fish-audio-mcp-server ``` ### Configuration 1. Get your Fish Audio API key from [Fish Audio](https://fish.audio/) 2. Set up environment variables: ```bash export FISH_API_KEY=your_fish_audio_api_key_here ``` 3. Add to your MCP settings configuration: #### Single Voice Mode (Simple) ```json { "mcpServers": { "fish-audio": { "command": "npx", "args": ["-y", "@alanse/fish-audio-mcp-server"], "env": { "FISH_API_KEY": "your_fish_audio_api_key_here", "FISH_MODEL_ID": "speech-1.6", "FISH_REFERENCE_ID": "your_voice_reference_id_here", "FISH_OUTPUT_FORMAT": "mp3", "FISH_STREAMING": "false", "FISH_LATENCY": "balanced", "FISH_MP3_BITRATE": "128", "FISH_AUTO_PLAY": "false", "AUDIO_OUTPUT_DIR": "~/.fish-audio-mcp/audio_output" } } } } ``` #### Multiple Voice Mode (Advanced) ```json { "mcpServers": { "fish-audio": { "command": "npx", "args": ["-y", "@alanse/fish-audio-mcp-server"], "env": { "FISH_API_KEY": "your_fish_audio_api_key_here", "FISH_MODEL_ID": "speech-1.6", "FISH_REFERENCES": "[{'reference_id':'id1','name':'Alice','tags':['female','english']},{'reference_id':'id2','name':'Bob','tags':['male','japanese']},{'reference_id':'id3','name':'Carol','tags':['female','japanese','anime']}]", "FISH_DEFAULT_REFERENCE": "id1", "FISH_OUTPUT_FORMAT": "mp3", "FISH_STREAMING": "false", "FISH_LATENCY": "balanced", "FISH_MP3_BITRATE": "128", "FISH_AUTO_PLAY": "false", "AUDIO_OUTPUT_DIR": "~/.fish-audio-mcp/audio_output" } } } } ``` ## Environment Variables | Variable | Description | Default | Required | |----------|-------------|---------|----------| | `FISH_API_KEY` | Your Fish Audio API key | - | Yes | | `FISH_MODEL_ID` | TTS model to use (s1, speech-1.5, speech-1.6) | `s1` | Optional | | `FISH_REFERENCE_ID` | Default voice reference ID (single reference mode) | - | Optional | | `FISH_REFERENCES` | Multiple voice references (see below) | - | Optional | | `FISH_DEFAULT_REFERENCE` | Default reference ID when using multiple references | - | Optional | | `FISH_OUTPUT_FORMAT` | Default audio format (mp3, wav, pcm, opus) | `mp3` | Optional | | `FISH_STREAMING` | Enable streaming mode (HTTP/WebSocket) | `false` | Optional | | `FISH_LATENCY` | Latency mode (normal, balanced) | `balanced` | Optional | | `FISH_MP3_BITRATE` | MP3 bitrate (64, 128, 192) | `128` | Optional | | `FISH_AUTO_PLAY` | Auto-play audio and enable real-time playback | `false` | Optional | | `AUDIO_OUTPUT_DIR` | Directory for audio file output | `~/.fish-audio-mcp/audio_output` | Optional | ### Configuring Multiple Voice References You can configure multiple voice references in two ways: #### JSON Array Format (Recommended) Use the `FISH_REFERENCES` environment variable with a JSON array: ```bash FISH_REFERENCES='[ {"reference_id":"id1","name":"Alice","tags":["female","english"]}, {"reference_id":"id2","name":"Bob","tags":["male","japanese"]}, {"reference_id":"id3","name":"Carol","tags":["female","japanese","anime"]} ]' FISH_DEFAULT_REFERENCE="id1" ``` #### Individual Format (Backward Compatibility) Use numbered environment variables: ```bash FISH_REFERENCE_1_ID=id1 FISH_REFERENCE_1_NAME=Alice FISH_REFERENCE_1_TAGS=female,english FISH_REFERENCE_2_ID=id2 FISH_REFERENCE_2_NAME=Bob FISH_REFERENCE_2_TAGS=male,japanese ``` ## Usage Once configured, the Fish Audio MCP server provides two tools to LLMs. ### Tool 1: `fish_audio_tts` Generates speech from text using Fish Audio's TTS API. #### Parameters - `text` (required): Text to convert to speech (max 10,000 characters) - `reference_id` (optional): Voice model reference ID - `reference_name` (optional): Select voice by name - `reference_tag` (optional): Select voice by tag - `streaming` (optional): Enable streaming mode - `format` (optional): Output format (mp3, wav, pcm, opus) - `mp3_bitrate` (optional): MP3 bitrate (64, 128, 192) - `normalize` (optional): Enable text normalization (default: true) - `latency` (optional): Latency mode (normal, balanced) - `output_path` (optional): Custom output file path - `auto_play` (optional): Automatically play the generated audio - `websocket_streaming` (optional): Use WebSocket streaming instead of HTTP - `realtime_play` (optional): Play audio in real-time during WebSocket streaming **Voice Selection Priority**: reference_id > reference_name > reference_tag > default ### Tool 2: `fish_audio_list_references` Lists all configured voice references. #### Parameters No parameters required. #### Returns - List of configured voice references with their IDs, names, and tags - Default reference ID ### Examples #### Basic Text-to-Speech ``` User: "Generate speech saying 'Hello, world! Welcome to Fish Audio TTS.'" Claude: I'll generate speech for that text using Fish Audio TTS. [Uses fish_audio_tts tool with text parameter] Result: Audio file saved to ./audio_output/tts_2025-01-03T10-30-00.mp3 ``` #### Using Custom Voice by ID ``` User: "Generate speech with voice model xyz123 saying 'This is a custom voice test'" Claude: I'll generate speech using the specified voice model. [Uses fish_audio_tts tool with text and reference_id parameters] Result: Audio generated with custom voice model xyz123 ``` #### Using Voice by Name ``` User: "Use Alice's voice to say 'Hello from Alice'" Claude: I'll generate speech using Alice's voice. [Uses fish_audio_tts tool with reference_name: "Alice"] Result: Audio generated with Alice's voice ``` #### Using Voice by Tag ``` User: "Generate Japanese speech saying 'こんにちは' with an anime voice" Claude: I'll generate Japanese speech with an anime-style voice. [Uses fish_audio_tts tool with reference_tag: "anime"] Result: Audio generated with anime voice style ``` #### List Available Voices ``` User: "What voices are available?" Claude: I'll list all configured voice references. [Uses fish_audio_list_references tool] Result: - Alice (id: id1) - Tags: female, english [Default] - Bob (id: id2) - Tags: male, japanese - Carol (id: id3) - Tags: female, japanese, anime ``` #### HTTP Streaming Mode ``` User: "Generate a long speech in streaming mode about the benefits of AI" Claude: I'll generate the speech in streaming mode for faster response. [Uses fish_audio_tts tool with streaming: true] Result: Streaming audio saved to ./audio_output/tts_2025-01-03T10-35-00.mp3 ``` #### WebSocket Real-time Streaming ``` User: "Stream and play in real-time: 'Welcome to the future of AI'" Claude: I'll stream the speech via WebSocket and play it in real-time. [Uses fish_audio_tts tool with websocket_streaming: true, realtime_play: true] Result: Audio streamed and played in real-time via WebSocket ``` ## Development ### Local Development 1. Clone the repository: ```bash git clone https://github.com/da-okazaki/mcp-fish-audio-server.git cd mcp-fish-audio-server ``` 2. Install dependencies: ```bash npm install ``` 3. Create `.env` file: ```bash cp .env.example .env # Edit .env with your API key ``` 4. Build the project: ```bash npm run build ``` 5. Run in development mode: ```bash npm run dev ``` ### Testing Run the test suite: ```bash npm test ``` ### Project Structure ``` mcp-fish-audio-server/ ├── src/ │ ├── index.ts # MCP server entry point │ ├── tools/ │ │ └── tts.ts # TTS tool implementation │ ├── services/ │ │ └── fishAudio.ts # Fish Audio API client │ ├── types/ │ │ └── index.ts # TypeScript definitions │ └── utils/ │ └── config.ts # Configuration management ├── tests/ # Test files ├── audio_output/ # Default audio output directory ├── package.json ├── tsconfig.json └── README.md ``` ## API Documentation ### Fish Audio Service The service provides two main methods: 1. **generateSpeech**: Standard TTS generation - Returns audio buffer - Suitable for short texts - Lower memory usage 2. **generateSpeechStream**: Streaming TTS generation - Returns audio stream - Suitable for long texts - Real-time processing ### Error Handling The server handles various error scenarios: - **INVALID_API_KEY**: Invalid or missing API key - **NETWORK_ERROR**: Connection issues with Fish Audio API - **INVALID_PARAMS**: Invalid request parameters - **QUOTA_EXCEEDED**: API rate limit exceeded - **SERVER_ERROR**: Fish Audio server errors ## Troubleshooting ### Common Issues 1. **"FISH_API_KEY environment variable is required"** - Ensure you've set the `FISH_API_KEY` environment variable - Check that the API key is valid 2. **"Network error: Unable to reach Fish Audio API"** - Check your internet connection - Verify Fish Audio API is accessible - Check for proxy/firewall issues 3. **"Text length exceeds maximum limit"** - Split long texts into smaller chunks - Maximum supported length is 10,000 characters 4. **Audio files not appearing** - Check the `AUDIO_OUTPUT_DIR` path exists - Ensure write permissions for the directory ## Contributing Contributions are welcome! Please feel free to submit a Pull Request. 1. Fork the repository 2. Create your feature branch (`git checkout -b feature/AmazingFeature`) 3. Commit your changes (`git commit -m 'Add some AmazingFeature'`) 4. Push to the branch (`git push origin feature/AmazingFeature`) 5. Open a Pull Request ## License This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. ## Acknowledgments - [Fish Audio](https://fish.audio/) for providing the excellent TTS API - [Anthropic](https://anthropic.com/) for creating the Model Context Protocol - The MCP community for inspiration and examples ## Support For issues, questions, or contributions, please visit the [GitHub repository](https://github.com/da-okazaki/mcp-fish-audio-server). ## Changelog See [CHANGELOG.md](CHANGELOG.md) for a detailed list of changes.

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/da-okazaki/mcp-fish-audio-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server