音声認識MCPサービス

このサービスは、stdio モードと MCP モードの両方を通じて音声認識およびテキスト抽出機能を提供します。

特徴

ファイルからの音声認識
base64エンコードされたデータからの音声認識
テキスト抽出
stdioとMCPモードの両方をサポート
構造化された音声認識結果

プロジェクト構造

voice_service.py - コアサービスの実装
stdio_server.py - stdio モードのエントリポイント
mcp_server.py - MCP モードのエントリポイント
build.py - 実行可能ファイルのビルドスクリプト
build_exec.sh - ビルド実行スクリプト
test_*.sh - さまざまな機能のテストスクリプト

インストール

リポジトリをクローンします。

git clone https://github.com/AIO-2030/mcp_voice_identify.git
cd mcp_voice_identify

依存関係をインストールします:

pip install -r requirements.txt

.envで環境変数を設定します。

API_URL=your_api_url
API_KEY=your_api_key

使用法

stdioモード

サービスを実行します:

python stdio_server.py

JSON-RPC リクエストを stdin 経由で送信します。

{
    "jsonrpc": "2.0",
    "method": "help",
    "params": {},
    "id": 1
}

または実行可能ファイルを使用します:

./dist/voice_stdio

MCPモード

サービスを実行します:

python mcp_server.py

または実行可能ファイルを使用します:

./dist/voice_mcp

音声認識結果

このサービスは構造化された音声認識結果を提供します。レスポンス形式の例を以下に示します。

元のAPIレスポンス

{
    "jsonrpc": "2.0",
    "result": {
        "message": "input processed successfully",
        "results": "test test test",
        "label_result": "<|en|><|EMO_UNKNOWN|><|Speech|><|woitn|>test test test"
    },
    "id": 1
}

再構築された対応

{
    "jsonrpc": "2.0",
    "result": {
        "message": "input processed successfully",
        "results": "test test test",
        "label_result": {
            "lan": "en",
            "emo": "unknown",
            "type": "speech",
            "speaker": "woitn",
            "text": "test test test"
        }
    },
    "id": 1
}

ラベル結果フィールド

label_resultフィールドには、次の構造化された情報が含まれます。

分野	説明	サンプル値
ラン	言語コード	「en」
エモ	感情状態	"未知"
タイプ	オーディオタイプ	「スピーチ」
スピーカー	スピーカー識別子	「ウォイトン」
文章	認識されたテキストコンテンツ	「テスト、テスト、テスト」

特殊ラベル

サービスは、元の応答内の次の特殊なラベルを認識して処理します。

<|en|> - 言語コード
<|EMO_UNKNOWN|> - 感情状態
<|Speech|> - オーディオの種類
<|woitn|> - 話者識別子

実行可能ファイルのビルド

ビルドスクリプトを実行可能にします。

chmod +x build_exec.sh

stdio モード実行可能ファイルをビルドします。

./build_exec.sh

MCP モード実行可能ファイルをビルドします。

./build_exec.sh mcp

実行可能ファイルは次の場所に作成されます:

stdioモード: dist/voice_stdio
MCPモード: dist/voice_mcp

テスト

テストスクリプトを実行します。

chmod +x test_*.sh
./test_help.sh
./test_voice_file.sh
./test_voice_base64.sh

ライセンス

このプロジェクトは MIT ライセンスに基づいてライセンスされています - 詳細については LICENSE ファイルを参照してください。

This server cannot be installed

security - not tested

license - permissive license

quality - not tested

How are these scores calculated?

remote-capable server

The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.

stdio モードと MCP モードの両方をサポートする音声認識およびテキスト抽出機能を提供し、オーディオファイルまたは base64 でエンコードされたデータを処理し、言語、感情、話者情報を含む構造化された結果を返します。

Related MCP Servers

MCP Access Server
shin-t-o
A
security
A
license
A
quality
Enables text extraction from web pages and PDFs, and execution of predefined commands, enhancing content processing and automation capabilities.
Last updated -
3
TypeScript
MIT License
Kokoro TTS MCP Server
giannisanni
-
security
F
license
-
quality
Provides text-to-speech capabilities through the Model Context Protocol, allowing applications to easily integrate speech synthesis with customizable voices, adjustable speech speed, and cross-platform audio playback support.
Last updated -
2
Python
Analytical MCP Server
quanticsoul4772
-
security
A
license
-
quality
Provides advanced analytical, research, and natural language processing capabilities through a Model Context Protocol server, enabling dataset analysis, decision analysis, and enhanced NLP features like entity recognition and fact extraction.
Last updated -
2
TypeScript
MIT License
Resemble AI Voice Generation MCP Server
obaid
-
security
F
license
-
quality
Integrates with Claude and Cursor using the Model Context Protocol to generate voice audio from text using Resemble AI's voices.
Last updated -
Python

View all related MCP servers

Appeared in Searches

A service to convert text to ready-to-use audio with download, player, or embed options

Voice Recognition MCP Service