Voice Recognition MCP Service
This service provides voice recognition and text extraction capabilities through both stdio and MCP modes.
Features
- Voice recognition from file
- Voice recognition from base64 encoded data
- Text extraction
- Support for both stdio and MCP modes
- Structured voice recognition results
- AIO protocol compliant responses
Project Structure
voice_service.py
- Core service implementationstdio_server.py
- stdio mode entry pointmcp_server.py
- MCP mode entry pointbuild.py
- Build script for executablesbuild_exec.sh
- Build execution scripttest_*.sh
- Test scripts for different functionalities
Installation
- Clone the repository:
- Install dependencies:
- Set up environment variables in
.env
:
Usage
stdio Mode
- Run the service:
- Send JSON-RPC requests via stdin:
- Or use the executable:
MCP Mode
- Run the service:
- Or use the executable:
Response Format
The service follows the AIO protocol for response formatting. Here are examples of different response types:
Voice Recognition Response
Help Information Response
Error Response
Response Fields
The service provides three types of responses:
- Voice Recognition Response (using
output
field):Field Description Example Value type Response type "voice" message Status message "Voice processed successfully" text Recognized text content "test test test" metadata Additional information See below - Help Information Response (using
result
field):Field Description Example Value type Service type "voice_service" description Service description "This service provides..." author Service author "AIO-2030" version Service version "1.0.0" github GitHub repository URL "https://github.com/..." transport Supported transport modes ["stdio"] methods Available methods See methods list - Error Response (using
output
field):Field Description Example Value type Response type "error" message Error message "503 Server Error: Service Unavailable" error_code HTTP status code 503
Metadata Fields
The metadata
field in voice recognition responses contains:
Field | Description | Example Value |
---|---|---|
language | Language code | "en" |
emotion | Emotion state | "unknown" |
audio_type | Audio type | "speech" |
speaker | Speaker identifier | "woitn" |
raw_text | Original recognized text | "test test test" |
Building Executables
- Make the build script executable:
- Build stdio mode executable:
- Build MCP mode executable:
The executables will be created at:
- stdio mode:
dist/voice_stdio
- MCP mode:
dist/voice_mcp
Testing
Run the test scripts:
License
This project is licensed under the MIT License - see the LICENSE file for details.
This server cannot be installed
remote-capable server
The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.
提供语音识别和文本提取功能,支持 stdio 和 MCP 模式,处理音频文件或 base64 编码数据并返回包含语言、情感和说话者信息的结构化结果。
Related MCP Servers
- -securityFlicense-qualityProvides text-to-speech capabilities through the Model Context Protocol, allowing applications to easily integrate speech synthesis with customizable voices, adjustable speech speed, and cross-platform audio playback support.Last updated -7
- -securityAlicense-qualityProvides advanced analytical, research, and natural language processing capabilities through a Model Context Protocol server, enabling dataset analysis, decision analysis, and enhanced NLP features like entity recognition and fact extraction.Last updated -3MIT License
- -securityFlicense-qualityIntegrates with Claude and Cursor using the Model Context Protocol to generate voice audio from text using Resemble AI's voices.Last updated -
- AsecurityAlicenseAqualityA Model Context Protocol server that integrates with VOICEVOX engine to provide text-to-speech synthesis and speaker information retrieval, allowing users to generate and play voice audio from text.Last updated -2MIT License