Provides configuration management through environment variables, allowing users to set up API credentials and endpoints securely in a .env file for the voice recognition service.
Enables source code management for the voice recognition service, with explicit instructions for cloning the repository from GitHub.
Hosts the voice recognition service repository, allowing users to access and download the source code for local deployment.
Serves as the implementation language for the voice recognition service, with specific commands for running the service and installing required dependencies.
Voice Recognition MCP Service
This service provides voice recognition and text extraction capabilities through both stdio and MCP modes.
Features
Voice recognition from file
Voice recognition from base64 encoded data
Text extraction
Support for both stdio and MCP modes
Structured voice recognition results
AIO protocol compliant responses
Project Structure
voice_service.py
- Core service implementationstdio_server.py
- stdio mode entry pointmcp_server.py
- MCP mode entry pointbuild.py
- Build script for executablesbuild_exec.sh
- Build execution scripttest_*.sh
- Test scripts for different functionalities
Installation
Clone the repository:
Install dependencies:
Set up environment variables in
.env
:
Usage
stdio Mode
Run the service:
Send JSON-RPC requests via stdin:
Or use the executable:
MCP Mode
Run the service:
Or use the executable:
Response Format
The service follows the AIO protocol for response formatting. Here are examples of different response types:
Voice Recognition Response
Help Information Response
Error Response
Response Fields
The service provides three types of responses:
Voice Recognition Response (using
output
field): | Field | Description | Example Value | |-----------|--------------------------------------|---------------| | type | Response type | "voice" | | message | Status message | "Voice processed successfully" | | text | Recognized text content | "test test test" | | metadata | Additional information | See below |Help Information Response (using
result
field): | Field | Description | Example Value | |---------------|--------------------------------------|---------------| | type | Service type | "voice_service" | | description | Service description | "This service provides..." | | author | Service author | "AIO-2030" | | version | Service version | "1.0.0" | | github | GitHub repository URL | "https://github.com/..." | | transport | Supported transport modes | ["stdio"] | | methods | Available methods | See methods list |Error Response (using
output
field): | Field | Description | Example Value | |-------------|--------------------------------------|---------------| | type | Response type | "error" | | message | Error message | "503 Server Error: Service Unavailable" | | error_code | HTTP status code | 503 |
Metadata Fields
The metadata
field in voice recognition responses contains:
Field | Description | Example Value |
language | Language code | "en" |
emotion | Emotion state | "unknown" |
audio_type | Audio type | "speech" |
speaker | Speaker identifier | "woitn" |
raw_text | Original recognized text | "test test test" |
Building Executables
Make the build script executable:
Build stdio mode executable:
Build MCP mode executable:
The executables will be created at:
stdio mode:
dist/voice_stdio
MCP mode:
dist/voice_mcp
Testing
Run the test scripts:
License
This project is licensed under the MIT License - see the LICENSE file for details.
This server cannot be installed
remote-capable server
The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.
Provides voice recognition and text extraction capabilities with support for both stdio and MCP modes, processing audio files or base64 encoded data and returning structured results with language, emotion, and speaker information.
Related MCP Servers
- -securityFlicense-qualityProvides text-to-speech capabilities through the Model Context Protocol, allowing applications to easily integrate speech synthesis with customizable voices, adjustable speech speed, and cross-platform audio playback support.Last updated -8
- -securityAlicense-qualityProvides advanced analytical, research, and natural language processing capabilities through a Model Context Protocol server, enabling dataset analysis, decision analysis, and enhanced NLP features like entity recognition and fact extraction.Last updated -3MIT License
- -securityFlicense-qualityIntegrates with Claude and Cursor using the Model Context Protocol to generate voice audio from text using Resemble AI's voices.Last updated -
- AsecurityAlicenseAqualityA Model Context Protocol server that integrates with VOICEVOX engine to provide text-to-speech synthesis and speaker information retrieval, allowing users to generate and play voice audio from text.Last updated -2MIT License