Which integrations are available for this server?

Utilizes FFmpeg for audio file format conversion and processing between various formats including WAV, MP3, M4A, FLAC, OGG, and AAC Integrates with Google Speech Recognition for accurate speech-to-text conversion across multiple languages Supports OpenAI Whisper as a remote API option for high-accuracy audio transcription Integrates with CMU Sphinx for lightweight, offline speech recognition capabilities

How do I use Voice to Text MCP Server?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@Voice to Text MCP Server transcribe this meeting recording in English" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

Voice to Text MCP Server

by gongjiaben

Overview Schema Related Servers Score Discussions

Python

语音转文字 MCP 服务器

一个功能强大的语音转文字 MCP 服务器，支持多种音频格式和识别引擎。

功能特性

🎯 核心功能

多引擎支持: 远程API调用（阿里云百炼、OpenAI Whisper、讯飞等）、Google Speech Recognition、CMU Sphinx
多格式支持: WAV、MP3、M4A、FLAC、OGG、AAC
多语言支持: 中文、英文、日文、韩文、法文、德文、西班牙文、俄文
批量处理: 支持批量转写多个音频文件
实时进度: 提供详细的转写进度信息
无本地模型: 全部通过远程API调用，无需下载大模型

🛠️ 工具功能

transcribe_audio_file: 转写音频文件
transcribe_audio_data: 转写音频数据
transcribe_with_remote_api: 通过远程API转写音频
batch_transcribe: 批量转写多个文件
analyze_audio_file: 分析音频文件信息
convert_audio_file_format: 转换音频格式
get_supported_formats: 获取支持的格式

📚 资源功能

audio://info/{file_path}: 获取音频文件信息
audio://formats: 获取支持的音频格式

💡 提示模板

语音转文字助手
音频格式转换助手

Related MCP server: Fast-Whisper-MCP-Server

安装

使用 uv (推荐)

# 克隆项目
git clone <repository-url>
cd DW_MCP_Server

# 安装依赖
uv sync

# 运行服务器
uv run python main.py

使用 pip

# 安装依赖
pip install -r requirements.txt

# 运行服务器
python main.py

使用方法

1. 启动服务器

# 开发模式
uv run mcp dev main.py

# 或者直接运行
uv run python main.py

2. 在 Claude Desktop 中安装

uv run mcp install main.py

3. 使用示例

转写单个音频文件

# 使用 Google Speech Recognition
result = await transcribe_audio_file(
    file_path="/path/to/audio.wav",
    language="zh-CN",
    engine="google"
)

# 使用远程API（需配置API密钥）
result = await transcribe_audio_file(
    file_path="/path/to/audio.mp3",
    language="zh-CN",
    engine="remote_api"
)

# 直接调用远程API
result = await transcribe_with_remote_api(
    file_path="/path/to/audio.wav",
    api_type="bailian",  # 支持 bailian, openai, xunfei
    api_key="your_api_key",
    api_url="your_api_url",
    language="zh-CN"
)

批量转写

file_paths = [
    "/path/to/audio1.wav",
    "/path/to/audio2.mp3",
    "/path/to/audio3.m4a"
]

results = await batch_transcribe(
    file_paths=file_paths,
    language="zh-CN",
    engine="whisper"
)

分析音频文件

info = await analyze_audio_file("/path/to/audio.wav")
print(f"格式: {info.format}")
print(f"时长: {info.duration}秒")
print(f"采样率: {info.sample_rate}Hz")

转换音频格式

output_path = await convert_audio_file_format(
    input_path="/path/to/audio.mp3",
    output_path="/path/to/output.wav",
    target_format="wav"
)

支持的格式

输入格式

WAV
MP3
M4A
FLAC
OGG
AAC

输出格式

WAV
MP3
TXT (转写文本)
SRT (字幕文件)
VTT (WebVTT 字幕)

支持的语言

中文 (zh-CN)
英文 (en-US)
日文 (ja-JP)
韩文 (ko-KR)
法文 (fr-FR)
德文 (de-DE)
西班牙文 (es-ES)
俄文 (ru-RU)

识别引擎对比

引擎	优点	缺点	适用场景
远程API（百炼/OpenAI/讯飞）	准确率高，支持多种语言，无需本地模型	需要网络连接和API密钥	在线应用
Google Speech Recognition	准确率高，支持多种语言	需要网络连接	在线应用
CMU Sphinx	完全离线，轻量级	准确率相对较低	嵌入式设备

配置选项

环境变量

# 设置默认语言
export DEFAULT_LANGUAGE=zh-CN

# 设置默认引擎
export DEFAULT_ENGINE=remote_api

# 设置默认API类型
export DEFAULT_API_TYPE=bailian

# 配置API密钥和地址
export BAILIAN_API_KEY=your_bailian_api_key
export BAILIAN_API_URL=https://bailian.aliyuncs.com/v1/audio/transcriptions

服务器配置

# 在 main.py 中修改服务器配置
mcp = FastMCP(
    "语音转文字服务",
    dependencies=["speechrecognition", "pydub", "openai-whisper", "torch"]
)

开发

安装开发依赖

uv sync --extra dev

运行测试

uv run pytest

代码格式化

uv run black main.py
uv run isort main.py

类型检查

uv run mypy main.py

故障排除

常见问题

API密钥配置错误

# 检查环境变量
echo $BAILIAN_API_KEY
echo $BAILIAN_API_URL

# 或在代码中直接传入
result = await transcribe_with_remote_api(
    file_path="audio.wav",
    api_key="your_api_key",
    api_url="your_api_url"
)

音频格式不支持

# 安装 ffmpeg
# Windows: 下载 ffmpeg 并添加到 PATH
# macOS: brew install ffmpeg
# Linux: sudo apt install ffmpeg

网络连接错误
- 检查网络连接
- 检查API地址是否正确
- 考虑使用本地引擎（Google Speech Recognition）

日志调试

# 启用详细日志
import logging
logging.basicConfig(level=logging.DEBUG)

贡献

欢迎提交 Issue 和 Pull Request！

开发指南

Fork 项目
创建功能分支
提交更改
推送到分支
创建 Pull Request

许可证

MIT License

更新日志

v0.1.0

初始版本
支持 Google Speech Recognition、Whisper、CMU Sphinx
支持多种音频格式
支持批量处理
提供进度反馈

联系方式

如有问题或建议，请通过以下方式联系：

提交 Issue
发送邮件
加入讨论群

注意: 使用远程API需要配置API密钥和地址，请在使用前设置相应的环境变量或在调用时传入参数。推荐使用阿里云百炼、OpenAI Whisper、讯飞等主流语音识别API。

This server cannot be installed

license - not found

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Appeared in Searches

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/gongjiaben/mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server