Skip to main content
Glama

VOICEVOX MCP Server

voicevox-mcp

This project is an MCP (Model Context Protocol) server that can synthesize speech and obtain speaker information in cooperation with the VOICEVOX engine. It is implemented in TypeScript and uses the MCP SDK.

function

  • Get speaker information for the VOICEVOX engine (/speakers)
  • Synthesize text to speech on a specified speaker and play it locally (/speak)
    • Mac only

set up

docker compose up -d

This will start the VOICEVOX engine on localhost:50021.

Install and build dependencies

npm install npm run build

How to use

Cursor setting example

{ "mcpServers": { "voicevox-mcp": { "command": "node", "args": ["${Path to Repository}/dist/index.js"], "env": { "SPEAKER_ID": 8, "SPEED_SCALE": 1.2, "VOICEVOX_API_URL": "http://localhost:50021" } } } }

Set VOICEVOX_API_URL as needed.

  • You can get a list of speakers from your MCP client using the speakers tool.
  • The speak tool can synthesize text to speech and play it back locally (Mac is recommended as it uses the afplay command).

Main Dependencies

  • @modelcontextprotocol/sdk
  • zod
  • typescript

Precautions

  • Future improvements
    • Speech synthesis will not be available unless the VOICEVOX engine is running on localhost:50021.
    • If you are using an environment other than Mac, please change the afplay part accordingly.

license

MIT License

You must be authenticated.

A
security – no known vulnerabilities
A
license - permissive license
A
quality - confirmed to work

local-only server

The server can only run on the client's local machine because it depends on local resources.

A Model Context Protocol server that integrates with VOICEVOX engine to provide text-to-speech synthesis and speaker information retrieval, allowing users to generate and play voice audio from text.

  1. function
    1. set up
      1. Starting the VOICEVOX engine (Docker recommended)
      2. Install and build dependencies
    2. How to use
      1. Cursor setting example
    3. Precautions
      1. license

        Related MCP Servers

        • -
          security
          F
          license
          -
          quality
          Provides text-to-speech capabilities through the Model Context Protocol, allowing applications to easily integrate speech synthesis with customizable voices, adjustable speech speed, and cross-platform audio playback support.
          Last updated -
          2
          Python
        • -
          security
          F
          license
          -
          quality
          A Model Context Protocol server that provides text-to-speech capabilities using the Kokoro TTS model, offering multiple voice options and customizable speech parameters.
          Last updated -
          239
          JavaScript
          • Apple
          • Linux
        • -
          security
          A
          license
          -
          quality
          A Model Context Protocol server that integrates high-quality text-to-speech capabilities with Claude Desktop and other MCP-compatible clients, supporting multiple voice options and audio formats.
          Last updated -
          TypeScript
          MIT License
        • A
          security
          A
          license
          A
          quality
          A Model Context Protocol server that enables AI models to generate and play high-quality text-to-speech audio through your device's native audio system using Rime's voice synthesis API.
          Last updated -
          1
          15
          4
          JavaScript
          The Unlicense
          • Apple
          • Linux

        View all related MCP servers

        MCP directory API

        We provide all the information about MCP servers via our MCP API.

        curl -X GET 'https://glama.ai/api/mcp/v1/servers/Yuki10Kobayashi/voicevox-mcp'

        If you have feedback or need assistance with the MCP directory API, please join our Discord server