How do I use MCP MeloTTS Audio Generator?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@MCP MeloTTS Audio Generator convert this Chinese text to speech with a slower pace" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

MCP MeloTTS Audio Generator

by aardpro

Overview Schema Related Servers Score Discussions

Python

Local

MCP MeloTTS 语音生成器 (MCP MeloTTS Audio Generator)

本项目是一个 Model Context Protocol (MCP) 服务器，为 AI 助手提供将文本生成语音音频的能力，基于 MeloTTS。

支持自动将长文本按标点与长度分割为不超过 100 字的小段，逐段生成 WAV 文件，并使用 ffmpeg 无损拼接为完整音频。

✨ 功能特性

文本转语音：使用 MeloTTS 将文本生成高质量语音
自动分段：长文本自动分段，避免一次过长导致失败
无损拼接：使用 ffmpeg 将分段 WAV 拼接为完整文件
可配置语速、语言、设备与说话人
跨平台：Windows、Linux、macOS

Related MCP server: GPT-SoVITS MCP Server

🚀 安装

使用 uvx (推荐)

docker run -d -p 127.0.0.1:9900:8888 kisaragi29/melotts:latest
uvx mcp-melotts@latest

使用 pip

pip install mcp-melotts@latest

从源码安装

git clone https://github.com/aardpro/mcp-melotts.git
cd mcp-melotts
pip install -e .

⚙️ 配置

配置

前置条件

开发电脑已安装并可用：

ffmpeg（用于拼接音频）
MeloTTS（可通过本地 Python 包 melo.api 或 Docker Gradio 服务）

启动前检查

启用 MCP 时会自动检查以下条件：

本机 9900 端口是否启动了 MeloTTS HTTP 服务（用于 HTTP 模式）
系统是否可调用 ffmpeg（用于无损拼接）任一不满足将抛出错误并终止启动

配置文件

内置配置文件位于 src/main/config.json，可设置：

defaultLanguage：默认语言代码（ZH/EN/JP/ES）
defaultSpeed：默认语速
defaultDevice：默认设备（cpu 或 cuda:0）
defaultSpeaker：默认说话人标签
chunkSizeLimit：分段长度上限（推荐 100）

MCP 客户端配置

将以下内容添加到你的 MCP 客户端配置中（例如 Claude Desktop, Cursor）：

选项 1: 使用 uvx

{
  "mcpServers": {
    "McpMeloTTS": {
      "command": "uvx",
      "args": ["mcp-melotts"]
    }
  }
}

选项 2: 使用 pip

{
  "mcpServers": {
    "McpMeloTTS": {
      "command": "mcp-melotts"
    }
  }
}

选项 3: Windows 系统

{
  "mcpServers": {
    "McpMeloTTS": {
      "command": "cmd",
      "args": [
        "/c",
        "chcp 65001 >nul && uvx mcp-melotts"
      ]
    }
  }
}

选项 4: Linux/macOS：

{
  "mcpServers": {
    "McpMeloTTS": {
      "command": "python",
      "args": ["-m", "main"]
    }
  }
}

🛠️ 可用工具

`mcp_melotts_generate_audio`

将文本生成语音文件。适用于“根据文本生成音频文件”等请求。支持自动分段与 ffmpeg 无损拼接，支持两种调用模式：本地 melo.api 与 Docker Gradio HTTP 接口。

参数:

text (string, 必填): 要转换为语音的文本
language (string, 可选): 语言代码（默认 ZH）
speaker (string, 可选): 说话人标签（默认 ZH）
speed (number, 可选): 语速（默认 1.0）
device (string, 可选): cpu 或 cuda:0（默认 cpu）
split_sentences (boolean, 可选): 是否自动分段（默认 true）
output_dir (string, 必填): 输出目录
target_filename (string, 可选): 最终输出文件名，默认时间戳命名
use_http_api (boolean, 可选): 是否使用 Docker Gradio HTTP 接口（默认 false）
api_base_url (string, 可选): HTTP 接口基础地址（如 http://localhost:9900）
fn_index (number, 可选): Gradio 函数索引（默认 1）
session_hash (string, 可选): Gradio 会话哈希（不提供则自动生成）

调用示例（本地 melo.api 模式）:

{
  "name": "mcp_melotts_generate_audio",
  "arguments": {
    "text": "我最近在学习machine learning，希望能够在未来的artificial intelligence领域有所建树。",
    "language": "ZH",
    "speaker": "ZH",
    "speed": 1.0,
    "output_dir": "./audio"
  }
}

调用示例（Docker Gradio HTTP 模式）:

{
  "name": "mcp_melotts_generate_audio",
  "arguments": {
    "text": "请将以下内容朗读为音频……",
    "language": "ZH",
    "speed": 1.0,
    "output_dir": "./audio",
    "use_http_api": true,
    "api_base_url": "http://localhost:9900",
    "fn_index": 1
  }
}

说明：HTTP 模式需要先运行 Docker 容器并开放 9900 端口，脚本会通过 /queue/join 与 /queue/data 监听生成进度并下载音频。

⚡ 快速开始

使用虚拟环境运行

python -m venv .venv
.venv/Scripts/python.exe -m pip install -U pip
.venv/Scripts/pip.exe install -r requirements.txt
.venv/Scripts/python.exe -m main

一键启动（Windows）

双击或执行项目根目录的 start_server.bat，它会自动创建 .venv、安装依赖并在虚拟环境中启动 MCP。

生成音频的测试脚本

.venv/Scripts/python.exe test-generate-audio.py --input test-text.txt --output-dir output --use-http-api --api-base-url http://localhost:9900

打印的路径即生成的 wav 文件（如 output/1767514828_93829.wav）。

💡 使用示例

配置完成后，你可以直接让 AI 助手：

"将这段中文生成音频文件"
"用日语朗读下面这段文字，语速稍慢"
"把长文拆分生成语音并合并为一个文件"

💻 开发

设置开发环境

git clone https://github.com/aardpro/mcp-melotts.git
cd mcp-melotts
pip install -e ".[dev]"

运行测试

pytest

构建包

build && upload

rm -rf dist && pip install build && python -m build && pip install twine && twine upload dist/*

pip install build
python -m build

发布到 PyPI

pip install twine
twine upload dist/*

修改后的发布步骤

当对项目进行修改后，按照以下步骤发布更新版本：

在 pyproject.toml 中增加版本号
安装构建依赖：
```
pip install build twine
```
构建包：
```
python -m build
```
本地测试构建的包（可选但推荐）：
```
pip install dist/mcp_melotts-*.whl
```
上传到 PyPI：
```
twine upload dist/*
```

📂 项目结构

mcp-melotts/
├── src/
│   └── main/
│       ├── __init__.py
│       ├── __main__.py
│       └── server.py
├── pyproject.toml
├── README.md
└── LICENSE

❓ 常见问题

配置问题

请确保已安装 ffmpeg 与 MeloTTS，并能在 Python 中导入 melo.api。

音频生成失败

如果音频生成失败，请检查：

是否已正确安装 MeloTTS（Python 包 melo）
是否已正确设置语言与说话人标签
ffmpeg 是否在系统 PATH 中可用

📄 许可证

MIT License - 详见 LICENSE 文件。

🤝 贡献

欢迎提交 Pull Request 来改进这个项目！

This server cannot be installed

license - permissive license

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Related MCP Servers

Kokoro TTS MCP Server
Text-to-Speech Audio Processing Multimedia Processing
noersy
F
license
-
quality
D
maintenance
Enables AI agents to generate and play high-quality text-to-speech audio using the Kokoro model, with support for multiple voices, adjustable speaking speed, and audio caching.
Last updated 2025-12-10
GPT-SoVITS MCP Server
Text-to-Speech Audio Processing Multimedia Processing
ganpare
F
license
-
quality
D
maintenance
Enables text-to-speech synthesis and voice cloning through GPT-SoVITS API integration. Supports multiple languages (Chinese, English, Japanese, Korean, Cantonese), dynamic model switching, and reference audio-based voice quality replication.
Last updated 2025-06-27
3
ttsstt-mcp-server
Text-to-Speech Speech Processing
ptbsare
A
license
-
quality
C
maintenance
Enables speech-to-text and text-to-speech conversion using OpenAI-compatible APIs. Supports customizable models, voices, and output directories.
Last updated 2026-06-20
GPL 3.0
Piper TTS MCP Server
Text-to-Speech Speech Processing
trulander
F
license
-
quality
D
maintenance
Enables AI models to generate high-quality voice messages from text using the Piper TTS engine, with automatic model management and audio streaming.
Last updated 2026-03-11

View all related MCP servers

Related MCP Connectors

tts
Hosted pay-per-use TTS: 54 neural voices, 9 languages incl. Brazilian Portuguese. $10 free credits.
Frenchie
OCR, transcription, file extraction, and image generation for AI agents via MCP.
distribea-mcp
Generate images, video, music and voice from your CLI or AI agent. On-brand AI media toolkit.

View all MCP Connectors

Latest Blog Posts

Who's Calling? MCP Hosts Are an Identity Blind Spot (And the Spec Knows It)
By Om-Shree-0709 on July 25, 2026.
mcp
Agent Identity
OAuth 2.1
Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/aardpro/mcp-melotts'

If you have feedback or need assistance with the MCP directory API, please join our Discord server