Voice Recognition MCP Service

by yangsenessa
MIT License

Voice Recognition MCP Service

This service provides voice recognition and text extraction capabilities through both stdio and MCP modes.

Features

  • Voice recognition from file
  • Voice recognition from base64 encoded data
  • Text extraction
  • Support for both stdio and MCP modes
  • Structured voice recognition results

Project Structure

  • voice_service.py - Core service implementation
  • stdio_server.py - stdio mode entry point
  • mcp_server.py - MCP mode entry point
  • build.py - Build script for executables
  • build_exec.sh - Build execution script
  • test_*.sh - Test scripts for different functionalities

Installation

  1. Clone the repository:
git clone https://github.com/AIO-2030/mcp_voice_identify.git cd mcp_voice_identify
  1. Install dependencies:
pip install -r requirements.txt
  1. Set up environment variables in .env:
API_URL=your_api_url API_KEY=your_api_key

Usage

stdio Mode

  1. Run the service:
python stdio_server.py
  1. Send JSON-RPC requests via stdin:
{ "jsonrpc": "2.0", "method": "help", "params": {}, "id": 1 }
  1. Or use the executable:
./dist/voice_stdio

MCP Mode

  1. Run the service:
python mcp_server.py
  1. Or use the executable:
./dist/voice_mcp

Voice Recognition Results

The service provides structured voice recognition results. Here's an example of the response format:

Original API Response

{ "jsonrpc": "2.0", "result": { "message": "input processed successfully", "results": "test test test", "label_result": "<|en|><|EMO_UNKNOWN|><|Speech|><|woitn|>test test test" }, "id": 1 }

Restructured Response

{ "jsonrpc": "2.0", "result": { "message": "input processed successfully", "results": "test test test", "label_result": { "lan": "en", "emo": "unknown", "type": "speech", "speaker": "woitn", "text": "test test test" } }, "id": 1 }

Label Result Fields

The label_result field contains the following structured information:

FieldDescriptionExample Value
lanLanguage code"en"
emoEmotion state"unknown"
typeAudio type"speech"
speakerSpeaker identifier"woitn"
textRecognized text content"test test test"

Special Labels

The service recognizes and processes the following special labels in the original response:

  • <|en|> - Language code
  • <|EMO_UNKNOWN|> - Emotion state
  • <|Speech|> - Audio type
  • <|woitn|> - Speaker identifier

Building Executables

  1. Make the build script executable:
chmod +x build_exec.sh
  1. Build stdio mode executable:
./build_exec.sh
  1. Build MCP mode executable:
./build_exec.sh mcp

The executables will be created at:

  • stdio mode: dist/voice_stdio
  • MCP mode: dist/voice_mcp

Testing

Run the test scripts:

chmod +x test_*.sh ./test_help.sh ./test_voice_file.sh ./test_voice_base64.sh

License

This project is licensed under the MIT License - see the LICENSE file for details.

-
security - not tested
-
license - not tested
-
quality - not tested

Provides voice recognition and text extraction capabilities with support for both stdio and MCP modes, processing audio files or base64 encoded data and returning structured results with language, emotion, and speaker information.

  1. Features
    1. Project Structure
      1. Installation
        1. Usage
          1. stdio Mode
          2. MCP Mode
        2. Voice Recognition Results
          1. Original API Response
          2. Restructured Response
          3. Label Result Fields
          4. Special Labels
        3. Building Executables
          1. Testing
            1. License
              ID: 00zdhqjmx3