Provides Docker Compose setup for easily running the required VOICEVOX engine locally.
The MCP server is implemented in TypeScript for type safety and developer experience.
Uses Zod for runtime schema validation within the MCP server implementation.
voicevox-mcp
This project is an MCP (Model Context Protocol) server that can synthesize speech and obtain speaker information in cooperation with the VOICEVOX engine. It is implemented in TypeScript and uses the MCP SDK.
function
- Get speaker information for the VOICEVOX engine (/speakers)
- Synthesize text to speech on a specified speaker and play it locally (/speak)
- Mac only
set up
Starting the VOICEVOX engine (Docker recommended)
This will start the VOICEVOX engine on localhost:50021.
Install and build dependencies
How to use
Cursor setting example
Set VOICEVOX_API_URL as needed.
- You can get a list of speakers from your MCP client using the speakers tool.
- The speak tool can synthesize text to speech and play it back locally (Mac is recommended as it uses the afplay command).
Main Dependencies
@modelcontextprotocol/sdk
zod
typescript
Precautions
- Future improvements
- Speech synthesis will not be available unless the VOICEVOX engine is running on localhost:50021.
- If you are using an environment other than Mac, please change the afplay part accordingly.
license
MIT License
You must be authenticated.
local-only server
The server can only run on the client's local machine because it depends on local resources.
A Model Context Protocol server that integrates with VOICEVOX engine to provide text-to-speech synthesis and speaker information retrieval, allowing users to generate and play voice audio from text.
Related Resources
Related MCP Servers
- -securityFlicense-qualityProvides text-to-speech capabilities through the Model Context Protocol, allowing applications to easily integrate speech synthesis with customizable voices, adjustable speech speed, and cross-platform audio playback support.Last updated -2Python
- -securityFlicense-qualityA Model Context Protocol server that provides text-to-speech capabilities using the Kokoro TTS model, offering multiple voice options and customizable speech parameters.Last updated -239JavaScript
- -securityAlicense-qualityA Model Context Protocol server that integrates high-quality text-to-speech capabilities with Claude Desktop and other MCP-compatible clients, supporting multiple voice options and audio formats.Last updated -TypeScriptMIT License
- AsecurityAlicenseAqualityA Model Context Protocol server that enables AI models to generate and play high-quality text-to-speech audio through your device's native audio system using Rime's voice synthesis API.Last updated -1154JavaScriptThe Unlicense