Voice Recognition MCP Service

This service provides voice recognition and text extraction capabilities through both stdio and MCP modes.

Features

Voice recognition from file
Voice recognition from base64 encoded data
Text extraction
Support for both stdio and MCP modes
Structured voice recognition results
AIO protocol compliant responses

Project Structure

voice_service.py - Core service implementation
stdio_server.py - stdio mode entry point
mcp_server.py - MCP mode entry point
build.py - Build script for executables
build_exec.sh - Build execution script
test_*.sh - Test scripts for different functionalities

Installation

Clone the repository:

git clone https://github.com/AIO-2030/mcp_voice_identify.git
cd mcp_voice_identify

Install dependencies:

pip install -r requirements.txt

Set up environment variables in .env:

API_URL=your_api_url
API_KEY=your_api_key

Usage

stdio Mode

Run the service:

python stdio_server.py

Send JSON-RPC requests via stdin:

{
    "jsonrpc": "2.0",
    "method": "help",
    "params": {},
    "id": 1
}

Or use the executable:

./dist/voice_stdio

MCP Mode

Run the service:

python mcp_server.py

Or use the executable:

./dist/voice_mcp

Response Format

The service follows the AIO protocol for response formatting. Here are examples of different response types:

Voice Recognition Response

{
    "jsonrpc": "2.0",
    "output": {
        "type": "voice",
        "message": "Voice processed successfully",
        "text": "test test test",
        "metadata": {
            "language": "en",
            "emotion": "unknown",
            "audio_type": "speech",
            "speaker": "woitn",
            "raw_text": "test test test"
        }
    },
    "id": 1
}

Help Information Response

{
    "jsonrpc": "2.0",
    "result": {
        "type": "voice_service",
        "description": "This service provides voice recognition and text extraction services",
        "author": "AIO-2030",
        "version": "1.0.0",
        "github": "https://github.com/AIO-2030/mcp_voice_identify",
        "transport": ["stdio"],
        "methods": [
            {
                "name": "help",
                "description": "Show this help information."
            },
            {
                "name": "identify_voice",
                "description": "Identify voice from file",
                "inputSchema": {
                    "type": "object",
                    "properties": {
                        "file_path": {
                            "type": "string",
                            "description": "Voice file path"
                        }
                    },
                    "required": ["file_path"]
                }
            },
            {
                "name": "identify_voice_base64",
                "description": "Identify voice from base64 encoded data",
                "inputSchema": {
                    "type": "object",
                    "properties": {
                        "base64_data": {
                            "type": "string",
                            "description": "Base64 encoded voice data"
                        }
                    },
                    "required": ["base64_data"]
                }
            },
            {
                "name": "extract_text",
                "description": "Extract text",
                "inputSchema": {
                    "type": "object",
                    "properties": {
                        "text": {
                            "type": "string",
                            "description": "Text to extract"
                        }
                    },
                    "required": ["text"]
                }
            }
        ]
    },
    "id": 1
}

Error Response

{
    "jsonrpc": "2.0",
    "output": {
        "type": "error",
        "message": "503 Server Error: Service Unavailable",
        "error_code": 503
    },
    "id": 1
}

Response Fields

The service provides three types of responses:

Voice Recognition Response (using output field):
Field Description Example Value
type Response type "voice"
message Status message "Voice processed successfully"
text Recognized text content "test test test"
metadata Additional information See below

Field	Description	Example Value
type	Response type	"voice"
message	Status message	"Voice processed successfully"
text	Recognized text content	"test test test"
metadata	Additional information	See below

Help Information Response (using result field):

Field	Description	Example Value
type	Service type	"voice_service"
description	Service description	"This service provides..."
author	Service author	"AIO-2030"
version	Service version	"1.0.0"
github	GitHub repository URL	"https://github.com/..."
transport	Supported transport modes	["stdio"]
methods	Available methods	See methods list

Error Response (using output field):
Field Description Example Value
type Response type "error"
message Error message "503 Server Error: Service Unavailable"
error_code HTTP status code 503

Field	Description	Example Value
type	Response type	"error"
message	Error message	"503 Server Error: Service Unavailable"
error_code	HTTP status code	503

Metadata Fields

The metadata field in voice recognition responses contains:

Field	Description	Example Value
language	Language code	"en"
emotion	Emotion state	"unknown"
audio_type	Audio type	"speech"
speaker	Speaker identifier	"woitn"
raw_text	Original recognized text	"test test test"

Building Executables

Make the build script executable:

chmod +x build_exec.sh

Build stdio mode executable:

./build_exec.sh

Build MCP mode executable:

./build_exec.sh mcp

The executables will be created at:

stdio mode: dist/voice_stdio
MCP mode: dist/voice_mcp

Testing

Run the test scripts:

chmod +x test_*.sh
./test_help.sh
./test_voice_file.sh
./test_voice_base64.sh

License

This project is licensed under the MIT License - see the LICENSE file for details.

This server cannot be installed

security - not tested

license - permissive license

quality - not tested

How are these scores calculated?

remote-capable server

The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.

Provides voice recognition and text extraction capabilities with support for both stdio and MCP modes, processing audio files or base64 encoded data and returning structured results with language, emotion, and speaker information.

Related MCP Servers

MCP Access Server
shin-t-o
A
security
A
license
A
quality
Enables text extraction from web pages and PDFs, and execution of predefined commands, enhancing content processing and automation capabilities.
Last updated -
3
TypeScript
MIT License
Kokoro TTS MCP Server
giannisanni
-
security
F
license
-
quality
Provides text-to-speech capabilities through the Model Context Protocol, allowing applications to easily integrate speech synthesis with customizable voices, adjustable speech speed, and cross-platform audio playback support.
Last updated -
2
Python
Analytical MCP Server
quanticsoul4772
-
security
A
license
-
quality
Provides advanced analytical, research, and natural language processing capabilities through a Model Context Protocol server, enabling dataset analysis, decision analysis, and enhanced NLP features like entity recognition and fact extraction.
Last updated -
2
TypeScript
MIT License
Resemble AI Voice Generation MCP Server
obaid
-
security
F
license
-
quality
Integrates with Claude and Cursor using the Model Context Protocol to generate voice audio from text using Resemble AI's voices.
Last updated -
Python

View all related MCP servers

Appeared in Searches

A service to convert text to ready-to-use audio with download, player, or embed options

Voice Recognition MCP Service