Voicevox MCP Server

by Dosugamea
MIT License
2
  • Linux

Integrations

  • Used for configuring environment variables like VOICEVOX API URL and speaker ID settings.

  • Enables running the VOICEVOX MCP server in a containerized environment, with specific configurations for audio output in WSL2.

  • Used for cloning the repository during installation.

Voicevox MCP Server

This is a server for using VOICEVOX compatible speech synthesis servers (AivisSpeech / VOICEVOX / COEIROINK) via MCP (Model Context Protocol). It can be used for speech synthesis in agent mode using Claude 3.7 in Cursor, etc.

Prerequisites

Windows environment

Docker environment (WSL2)

  • Docker and Docker Compose
  • WSL2
  • VOICEVOX ENGINE etc. (run locally or in Docker)
  • sudo apt install libsdl2-dev pulseaudio-utils pulseaudio -enabled Linux environment
  • Permission to access /mnt/wslg

Installation and Configuration

  1. Clone the repository
git clone https://github.com/Dosugamea/voicevox-mcp-server.git cd voicevox-mcp-server
  1. Installing dependencies
npm install
  1. Setting environment variables Create a .env file by copying .env_example and modifying the settings as needed:
VOICEVOX_API_URL=http://localhost:50021 VOICEVOX_SPEAKER_ID=1

How to do it

Execution in Windows environment

Please launch a server separately from the editor by following the steps below.

npm run build npm start

Execution in Docker environment

No need to use an editor or any other operations. It starts in stdio mode so it cannot be executed directly.

How to set it up

When running in a Windows environment

Please add the following to mcp.json. The connection is unstable, so please reconnect if it is disconnected.

"voicevox": { "url": "http://localhost:10100/sse" }

When running in a Docker environment

Please add the following to mcp.json. (The author's environment has not been tested.)

{ "tools": { "voicevox": { "command": "cmd", "args": [ "/c", "docker", "run", "-i", "--rm", "-v", "/mnt/wslg:/mnt/wslg", "-e", "PULSE_SERVER", "-e", "SDL_AUDIODRIVER", "-e", "VOICEVOX_API_URL", "-e", "VOICEVOX_SPEAKER_ID", "your-local-docker-image-name" ], "env": { "PULSE_SERVER": "unix:/mnt/wslg/PulseServer", "SDL_AUDIODRIVER": "pulseaudio", "VOICEVOX_API_URL": "http://host.docker.internal:50031", "VOICEVOX_SPEAKER_ID": "919692871" } } } }

About Speaker ID

The speaker ID varies depending on the model of VOICEVOX you are using. By default, "1" (Shikoku Metan) is used. If you want to use another speaker ID, change the environment variable VOICEVOX_SPEAKER_ID .

You can check the list of speaker IDs at the /speakers endpoint of the VOICEVOX ENGINE API. Example: curl http://localhost:50021/speakers

troubleshooting

  • Connection error with VOICEVOX : Please make sure that VOICEVOX ENGINE is running and that the API URL is set correctly.
  • No sound playing : Make sure VLC is properly installed and in your path.
  • Audio output problem in Docker environment : Please check that pulseaudio is configured correctly.

Developer Information

  • To contribute to the source code, please create an issue or submit a pull request.
  • To report bugs or request features, please use the Issues feature on GitHub.

license

MIT License

-
security - not tested
A
license - permissive license
-
quality - not tested

A server that enables Claude 3.7 and other AI agents to access VOICEVOX-compatible speech synthesis engines (AivisSpeech, VOICEVOX, COEIROINK) through the Model Context Protocol.

  1. Prerequisites
    1. Windows environment
    2. Docker environment (WSL2)
  2. Installation and Configuration
    1. How to do it
      1. Execution in Windows environment
      2. Execution in Docker environment
    2. How to set it up
      1. When running in a Windows environment
      2. When running in a Docker environment
    3. About Speaker ID
      1. troubleshooting
        1. Developer Information
          1. license

            Related MCP Servers

            • -
              security
              F
              license
              -
              quality
              Provides text-to-speech capabilities through the Model Context Protocol, allowing applications to easily integrate speech synthesis with customizable voices, adjustable speech speed, and cross-platform audio playback support.
              Last updated -
              2
              Python
            • -
              security
              F
              license
              -
              quality
              A Model Context Protocol server that enables AI assistants to utilize AivisSpeech Engine's high-quality voice synthesis capabilities through a standardized API interface.
              Last updated -
              TypeScript
            • A
              security
              A
              license
              A
              quality
              A Model Context Protocol server that enables AI models to generate and play high-quality text-to-speech audio through your device's native audio system using Rime's voice synthesis API.
              Last updated -
              1
              176
              4
              JavaScript
              The Unlicense
              • Apple
              • Linux
            • -
              security
              A
              license
              -
              quality
              A Model Context Protocol server that enables AI assistants like Claude to initiate and manage real-time voice calls using Twilio and OpenAI's voice models.
              Last updated -
              14
              TypeScript
              MIT License
              • Apple

            View all related MCP servers

            ID: goem3ufopt