Voicevox MCP Server

This is a server for using VOICEVOX compatible speech synthesis servers (AivisSpeech / VOICEVOX / COEIROINK) via MCP (Model Context Protocol). It can be used for speech synthesis in agent mode using Claude 3.7 in Cursor, etc.

Prerequisites

Windows environment

Node.js 18 or higher
VOICEVOX ENGINE etc. ( run locally at http://localhost:50000 etc. )
VLC media player (path must be set)

Docker environment (WSL2)

Docker and Docker Compose
WSL2
VOICEVOX ENGINE etc. (run locally or in Docker)
sudo apt install libsdl2-dev pulseaudio-utils pulseaudio -enabled Linux environment
Permission to access /mnt/wslg

Installation and Configuration

Clone the repository

git clone https://github.com/Dosugamea/voicevox-mcp-server.git
cd voicevox-mcp-server

Installing dependencies

npm install

Setting environment variables Create a .env file by copying .env_example and modifying the settings as needed:

VOICEVOX_API_URL=http://localhost:50021
VOICEVOX_SPEAKER_ID=1

How to do it

Execution in Windows environment

Please launch a server separately from the editor by following the steps below.

npm run build
npm start

Execution in Docker environment

No need to use an editor or any other operations. It starts in stdio mode so it cannot be executed directly.

How to set it up

When running in a Windows environment

Please add the following to mcp.json. The connection is unstable, so please reconnect if it is disconnected.

        "voicevox": {
            "url": "http://localhost:10100/sse"
        }

When running in a Docker environment

Please add the following to mcp.json. (The author's environment has not been tested.)

{
    "tools": {
        "voicevox": {
            "command": "cmd",
            "args": [
                "/c",
                "docker",
                "run",
                "-i",
                "--rm",
                "-v",
                "/mnt/wslg:/mnt/wslg",
                "-e",
                "PULSE_SERVER",
                "-e",
                "SDL_AUDIODRIVER",
                "-e",
                "VOICEVOX_API_URL",
                "-e",
                "VOICEVOX_SPEAKER_ID",
                "your-local-docker-image-name"
            ],
            "env": {
                "PULSE_SERVER": "unix:/mnt/wslg/PulseServer",
                "SDL_AUDIODRIVER": "pulseaudio",
                "VOICEVOX_API_URL": "http://host.docker.internal:50031",
                "VOICEVOX_SPEAKER_ID": "919692871"
            }
        }
    }
}

About Speaker ID

The speaker ID varies depending on the model of VOICEVOX you are using. By default, "1" (Shikoku Metan) is used. If you want to use another speaker ID, change the environment variable VOICEVOX_SPEAKER_ID .

You can check the list of speaker IDs at the /speakers endpoint of the VOICEVOX ENGINE API. Example: curl http://localhost:50021/speakers

troubleshooting

Connection error with VOICEVOX : Please make sure that VOICEVOX ENGINE is running and that the API URL is set correctly.
No sound playing : Make sure VLC is properly installed and in your path.
Audio output problem in Docker environment : Please check that pulseaudio is configured correctly.

Developer Information

To contribute to the source code, please create an issue or submit a pull request.
To report bugs or request features, please use the Issues feature on GitHub.

license

MIT License

This server cannot be installed

security - not tested

license - permissive license

quality - not tested

How are these scores calculated?

hybrid server

The server is able to function both locally and remotely, depending on the configuration or use case.

A server that enables Claude 3.7 and other AI agents to access VOICEVOX-compatible speech synthesis engines (AivisSpeech, VOICEVOX, COEIROINK) through the Model Context Protocol.

Related MCP Servers

Kokoro TTS MCP Server
giannisanni
-
security
F
license
-
quality
Provides text-to-speech capabilities through the Model Context Protocol, allowing applications to easily integrate speech synthesis with customizable voices, adjustable speech speed, and cross-platform audio playback support.
Last updated -
2
Python
AivisSpeech MCP Server
kentaro
-
security
F
license
-
quality
A Model Context Protocol server that enables AI assistants to utilize AivisSpeech Engine's high-quality voice synthesis capabilities through a standardized API interface.
Last updated -
TypeScript
Rime MCP
MatthewDailey
A
security
A
license
A
quality
A Model Context Protocol server that enables AI models to generate and play high-quality text-to-speech audio through your device's native audio system using Rime's voice synthesis API.
Last updated -
1
15
4
JavaScript
The Unlicense
Voice Call MCP Server
popcornspace
-
security
A
license
-
quality
A Model Context Protocol server that enables AI assistants like Claude to initiate and manage real-time voice calls using Twilio and OpenAI's voice models.
Last updated -
14
TypeScript
MIT License

View all related MCP servers

Appeared in Searches

Tools and Applications for Text-to-Speech Conversion

Voicevox MCP Server