Provides configuration management through environment variables, allowing users to set up API credentials and endpoints securely in a .env file for the voice recognition service.
Enables source code management for the voice recognition service, with explicit instructions for cloning the repository from GitHub.
Hosts the voice recognition service repository, allowing users to access and download the source code for local deployment.
Serves as the implementation language for the voice recognition service, with specific commands for running the service and installing required dependencies.
Voice Recognition MCP Service
This service provides voice recognition and text extraction capabilities through both stdio and MCP modes.
Features
- Voice recognition from file
- Voice recognition from base64 encoded data
- Text extraction
- Support for both stdio and MCP modes
- Structured voice recognition results
- AIO protocol compliant responses
Project Structure
voice_service.py
- Core service implementationstdio_server.py
- stdio mode entry pointmcp_server.py
- MCP mode entry pointbuild.py
- Build script for executablesbuild_exec.sh
- Build execution scripttest_*.sh
- Test scripts for different functionalities
Installation
- Clone the repository:
- Install dependencies:
- Set up environment variables in
.env
:
Usage
stdio Mode
- Run the service:
- Send JSON-RPC requests via stdin:
- Or use the executable:
MCP Mode
- Run the service:
- Or use the executable:
Response Format
The service follows the AIO protocol for response formatting. Here are examples of different response types:
Voice Recognition Response
Help Information Response
Error Response
Response Fields
The service provides three types of responses:
- Voice Recognition Response (using
output
field):Field Description Example Value type Response type "voice" message Status message "Voice processed successfully" text Recognized text content "test test test" metadata Additional information See below - Help Information Response (using
result
field):Field Description Example Value type Service type "voice_service" description Service description "This service provides..." author Service author "AIO-2030" version Service version "1.0.0" github GitHub repository URL "https://github.com/..." transport Supported transport modes ["stdio"] methods Available methods See methods list - Error Response (using
output
field):Field Description Example Value type Response type "error" message Error message "503 Server Error: Service Unavailable" error_code HTTP status code 503
Metadata Fields
The metadata
field in voice recognition responses contains:
Field | Description | Example Value |
---|---|---|
language | Language code | "en" |
emotion | Emotion state | "unknown" |
audio_type | Audio type | "speech" |
speaker | Speaker identifier | "woitn" |
raw_text | Original recognized text | "test test test" |
Building Executables
- Make the build script executable:
- Build stdio mode executable:
- Build MCP mode executable:
The executables will be created at:
- stdio mode:
dist/voice_stdio
- MCP mode:
dist/voice_mcp
Testing
Run the test scripts:
License
This project is licensed under the MIT License - see the LICENSE file for details.
This server cannot be installed
remote-capable server
The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.
Provides voice recognition and text extraction capabilities with support for both stdio and MCP modes, processing audio files or base64 encoded data and returning structured results with language, emotion, and speaker information.
Related MCP Servers
- -securityAlicense-qualityA Goose MCP extension providing voice interaction with modern audio visualization, allowing users to communicate with Goose through speech rather than text.Last updated -36PythonMIT License
- AsecurityAlicenseAqualityMCP to analyse local audio file.Last updated -810PythonMIT License
- -securityAlicense-qualityA Model Context Protocol server that integrates high-quality text-to-speech capabilities with Claude Desktop and other MCP-compatible clients, supporting multiple voice options and audio formats.Last updated -TypeScriptMIT License
- AsecurityAlicenseAqualityA MCP server that enables transcription of audio files using OpenAI's Speech-to-Text API, with support for multiple languages and file saving options.Last updated -12JavaScriptMIT License