Skip to main content
Glama

Voice Recognition MCP Service

by yangsenessa

Voice Recognition MCP Service

This service provides voice recognition and text extraction capabilities through both stdio and MCP modes.

Features

  • Voice recognition from file
  • Voice recognition from base64 encoded data
  • Text extraction
  • Support for both stdio and MCP modes
  • Structured voice recognition results
  • AIO protocol compliant responses

Project Structure

  • voice_service.py - Core service implementation
  • stdio_server.py - stdio mode entry point
  • mcp_server.py - MCP mode entry point
  • build.py - Build script for executables
  • build_exec.sh - Build execution script
  • test_*.sh - Test scripts for different functionalities

Installation

  1. Clone the repository:
git clone https://github.com/AIO-2030/mcp_voice_identify.git cd mcp_voice_identify
  1. Install dependencies:
pip install -r requirements.txt
  1. Set up environment variables in .env:
API_URL=your_api_url API_KEY=your_api_key

Usage

stdio Mode

  1. Run the service:
python stdio_server.py
  1. Send JSON-RPC requests via stdin:
{ "jsonrpc": "2.0", "method": "help", "params": {}, "id": 1 }
  1. Or use the executable:
./dist/voice_stdio

MCP Mode

  1. Run the service:
python mcp_server.py
  1. Or use the executable:
./dist/voice_mcp

Response Format

The service follows the AIO protocol for response formatting. Here are examples of different response types:

Voice Recognition Response

{ "jsonrpc": "2.0", "output": { "type": "voice", "message": "Voice processed successfully", "text": "test test test", "metadata": { "language": "en", "emotion": "unknown", "audio_type": "speech", "speaker": "woitn", "raw_text": "test test test" } }, "id": 1 }

Help Information Response

{ "jsonrpc": "2.0", "result": { "type": "voice_service", "description": "This service provides voice recognition and text extraction services", "author": "AIO-2030", "version": "1.0.0", "github": "https://github.com/AIO-2030/mcp_voice_identify", "transport": ["stdio"], "methods": [ { "name": "help", "description": "Show this help information." }, { "name": "identify_voice", "description": "Identify voice from file", "inputSchema": { "type": "object", "properties": { "file_path": { "type": "string", "description": "Voice file path" } }, "required": ["file_path"] } }, { "name": "identify_voice_base64", "description": "Identify voice from base64 encoded data", "inputSchema": { "type": "object", "properties": { "base64_data": { "type": "string", "description": "Base64 encoded voice data" } }, "required": ["base64_data"] } }, { "name": "extract_text", "description": "Extract text", "inputSchema": { "type": "object", "properties": { "text": { "type": "string", "description": "Text to extract" } }, "required": ["text"] } } ] }, "id": 1 }

Error Response

{ "jsonrpc": "2.0", "output": { "type": "error", "message": "503 Server Error: Service Unavailable", "error_code": 503 }, "id": 1 }

Response Fields

The service provides three types of responses:

  1. Voice Recognition Response (using output field):
    FieldDescriptionExample Value
    typeResponse type"voice"
    messageStatus message"Voice processed successfully"
    textRecognized text content"test test test"
    metadataAdditional informationSee below
  2. Help Information Response (using result field):
    FieldDescriptionExample Value
    typeService type"voice_service"
    descriptionService description"This service provides..."
    authorService author"AIO-2030"
    versionService version"1.0.0"
    githubGitHub repository URL"https://github.com/..."
    transportSupported transport modes["stdio"]
    methodsAvailable methodsSee methods list
  3. Error Response (using output field):
    FieldDescriptionExample Value
    typeResponse type"error"
    messageError message"503 Server Error: Service Unavailable"
    error_codeHTTP status code503

Metadata Fields

The metadata field in voice recognition responses contains:

FieldDescriptionExample Value
languageLanguage code"en"
emotionEmotion state"unknown"
audio_typeAudio type"speech"
speakerSpeaker identifier"woitn"
raw_textOriginal recognized text"test test test"

Building Executables

  1. Make the build script executable:
chmod +x build_exec.sh
  1. Build stdio mode executable:
./build_exec.sh
  1. Build MCP mode executable:
./build_exec.sh mcp

The executables will be created at:

  • stdio mode: dist/voice_stdio
  • MCP mode: dist/voice_mcp

Testing

Run the test scripts:

chmod +x test_*.sh ./test_help.sh ./test_voice_file.sh ./test_voice_base64.sh

License

This project is licensed under the MIT License - see the LICENSE file for details.

-
security - not tested
A
license - permissive license
-
quality - not tested

remote-capable server

The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.

Provides voice recognition and text extraction capabilities with support for both stdio and MCP modes, processing audio files or base64 encoded data and returning structured results with language, emotion, and speaker information.

  1. Features
    1. Project Structure
      1. Installation
        1. Usage
          1. stdio Mode
          2. MCP Mode
        2. Response Format
          1. Voice Recognition Response
          2. Help Information Response
          3. Error Response
          4. Response Fields
          5. Metadata Fields
        3. Building Executables
          1. Testing
            1. License

              Related MCP Servers

              • -
                security
                A
                license
                -
                quality
                A Goose MCP extension providing voice interaction with modern audio visualization, allowing users to communicate with Goose through speech rather than text.
                Last updated -
                36
                Python
                MIT License
                • Linux
                • Apple
              • -
                security
                A
                license
                -
                quality
                A Model Context Protocol server that integrates high-quality text-to-speech capabilities with Claude Desktop and other MCP-compatible clients, supporting multiple voice options and audio formats.
                Last updated -
                TypeScript
                MIT License
              • A
                security
                A
                license
                A
                quality
                A MCP server that enables transcription of audio files using OpenAI's Speech-to-Text API, with support for multiple languages and file saving options.
                Last updated -
                1
                2
                JavaScript
                MIT License
                • Linux
                • Apple

              View all related MCP servers

              MCP directory API

              We provide all the information about MCP servers via our MCP API.

              curl -X GET 'https://glama.ai/api/mcp/v1/servers/yangsenessa/mcp_voice_identify'

              If you have feedback or need assistance with the MCP directory API, please join our Discord server