voice-mcp
Integrates with Alibaba Cloud's DashScope service for voice synthesis and cloning using the Qwen-TTS Voice Cloning API.
Integrates with ElevenLabs Text to Speech API for high-quality voice synthesis with options for style tags, voice settings, and language selection.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@voice-mcpsay 'The weather today is sunny and warm' in my cloned voice"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
voice-mcp
An MCP (Model Context Protocol) server for AI voice synthesis with an inline audio player. Give your AI assistant a custom cloned voice!
Fork Notice
This repository is a fork of garan0613/voice-mcp, released under the MIT License.
This fork lives at Yinglianchun/voice-mcp and keeps the original MCP speak(text) behavior while adding provider switching, ElevenLabs support, and a live visualizer panel.
Related MCP server: VOICEVOX TTS MCP
What Changed in This Fork
Added
TTS_PROVIDERswitching between DashScope/CosyVoice and ElevenLabs.Kept the old
speak(text)call compatible, and extended it tospeak(text, style?, raw_tags?).Added ElevenLabs TTS support with configurable model, output format, voice settings, and optional v3 audio tags.
Added style-to-tag mapping for ElevenLabs v3, while stripping raw audio tags before DashScope/CosyVoice calls.
Added
/statusfields for provider, model, voice, configuration state, and audio tag availability.Added
/panel, a breathing audio visualizer that listens for the latest MCPspeakresult.Added
/events/latestso the panel can receive the newest generated voice and text.Added ElevenLabs history loading through
/history?id=....Added line-style captions, playback-linked caption timing when ElevenLabs timing data is available, and MP3 download from the panel.
Features
Custom Voice Cloning — Use DashScope Qwen-TTS Voice Cloning API or ElevenLabs TTS with your own cloned voice
Inline Audio Player — Beautiful WeChat-style player with waveform visualization
Breathing Visualizer Panel — Use
/panelto listen for the latest MCPspeakoutputTranscript Toggle — Show/hide the spoken text
Dark Mode Support — Automatic theme adaptation
Cloudflare Workers — Fast, serverless deployment
Demo
When you call the speak tool, you get:
A sleek audio player with play/pause button
Animated waveform that follows playback progress
Duration display
Expandable transcript
Quick Start
1. Clone the repository
git clone https://github.com/Yinglianchun/voice-mcp.git
cd voice-mcp2. Install dependencies
npm install3. Configure TTS provider
Set the provider. If omitted, the worker uses DashScope.
npx wrangler secret put TTS_PROVIDER # dashscope or elevenlabsDashScope / CosyVoice
You'll need an Alibaba Cloud DashScope account with Qwen-TTS Voice Cloning access.
Add your secrets to Cloudflare:
npx wrangler secret put DASHSCOPE_API_KEY
npx wrangler secret put VOICE_ID
npx wrangler secret put BOT_NAME # Optional, defaults to "AI"Optional:
npx wrangler secret put TTS_MODEL # Default: qwen3-tts-vc-2026-01-22ElevenLabs
Add your ElevenLabs secrets to Cloudflare:
npx wrangler secret put ELEVENLABS_API_KEY
npx wrangler secret put ELEVENLABS_VOICE_ID
npx wrangler secret put ELEVENLABS_VOICE_ID_ZH
npx wrangler secret put ELEVENLABS_VOICE_ID_ENOptional:
npx wrangler secret put ELEVENLABS_MODEL_ID # Default: eleven_v3
npx wrangler secret put ELEVENLABS_OUTPUT_FORMAT # Default: mp3_44100_128
npx wrangler secret put ELEVENLABS_LANGUAGE_CODE # Example: zh
npx wrangler secret put ELEVENLABS_LANGUAGE_CODE_ZH # Default with zh voice: zh
npx wrangler secret put ELEVENLABS_LANGUAGE_CODE_EN # Default with en voice: en
npx wrangler secret put ELEVENLABS_STABILITY # Example: 0.36
npx wrangler secret put ELEVENLABS_STYLE # Example: 0.85
npx wrangler secret put ELEVENLABS_SPEED # Example: 1.20eleven_v3 supports audio tags such as [whispers], [sighs], and [laughs].
eleven_multilingual_v2 is a steadier choice for ordinary reading.
4. Deploy
npx wrangler deploy5. Connect to Claude.ai
Go to Settings -> Connectors -> Add Connector
Enter your Worker URL:
https://your-worker.workers.dev/mcpDone! The
speaktool is now available.
Configuration
Variable | Required | Description |
| No |
|
| DashScope | Your DashScope API key |
| DashScope | The cloned voice ID (Qwen-TTS VC) |
| No | Display name (default: "AI") |
| No | DashScope TTS model (default: |
| ElevenLabs | Your ElevenLabs API key |
| ElevenLabs | Default/fallback ElevenLabs voice ID |
| No | Chinese ElevenLabs voice ID; auto-selected when text contains Chinese |
| No | English ElevenLabs voice ID; auto-selected for English text |
| No | ElevenLabs model (default: |
| No | ElevenLabs output format (default: |
| No | ElevenLabs request language code, such as |
| No | Chinese request language code; defaults to |
| No | English request language code; defaults to |
| No | ElevenLabs voice setting override, such as |
| No | ElevenLabs voice setting override |
| No | ElevenLabs voice setting override, such as |
| No | ElevenLabs voice setting override, |
| No | ElevenLabs voice setting override, such as |
API Endpoints
Endpoint | Description |
| MCP server (SSE protocol) |
| Breathing voice visualizer that listens for MCP |
| Latest generated voice event for the visualizer |
| Load an ElevenLabs history item into the visualizer |
| Direct audio file |
| Direct audio file with optional style |
| Preserve ElevenLabs v3 audio tags |
| Health check |
The MCP speak tool accepts:
speak(text: string, style?: string, raw_tags?: boolean)Existing speak(text) calls remain compatible.
When the MCP speak tool succeeds, the Worker stores the latest voice event for
/panel. Keep /panel open while using speak; when a new voice arrives, the
visualizer loads it and enables playback.
ElevenLabs uses the speech-with-timing API to store line-level caption cues for
sync; providers without timing data fall back to approximate caption progress.
When TTS_PROVIDER=elevenlabs and ELEVENLABS_MODEL_ID=eleven_v3, raw_tags=true
passes text through unchanged. Without raw_tags=true, supported styles map to
ElevenLabs v3 audio tags:
Style | Audio tag |
|
|
|
|
|
|
|
|
|
|
|
|
DashScope/CosyVoice and non-v3 ElevenLabs calls strip raw audio tags before sending text to the provider.
Tech Stack
Cloudflare Workers — Serverless runtime
MCP SDK — Model Context Protocol
DashScope Qwen-TTS VC — Voice synthesis
ElevenLabs Text to Speech — Voice synthesis
ext-apps — Inline UI rendering
License
MIT. This fork preserves the upstream license from garan0613/voice-mcp.
This server cannot be installed
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
- Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)By Om-Shree-0709 on .Agentic AiPrompt InjectionWebAssembly
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/ALLY596/zero-voice-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server