音声認識MCPサービス

このサービスは、stdio モードと MCP モードの両方を通じて音声認識およびテキスト抽出機能を提供します。

特徴

ファイルからの音声認識
base64エンコードされたデータからの音声認識
テキスト抽出
stdioとMCPモードの両方をサポート
構造化された音声認識結果

Related MCP server: Analytical MCP Server

プロジェクト構造

voice_service.py - コアサービスの実装
stdio_server.py - stdio モードのエントリポイント
mcp_server.py - MCP モードのエントリポイント
build.py - 実行可能ファイルのビルドスクリプト
build_exec.sh - ビルド実行スクリプト
test_*.sh - さまざまな機能のテストスクリプト

インストール

リポジトリをクローンします。

git clone https://github.com/AIO-2030/mcp_voice_identify.git
cd mcp_voice_identify

依存関係をインストールします:

pip install -r requirements.txt

.envで環境変数を設定します。

API_URL=your_api_url
API_KEY=your_api_key

使用法

stdioモード

サービスを実行します:

python stdio_server.py

JSON-RPC リクエストを stdin 経由で送信します。

{
    "jsonrpc": "2.0",
    "method": "help",
    "params": {},
    "id": 1
}

または実行可能ファイルを使用します:

./dist/voice_stdio

MCPモード

サービスを実行します:

python mcp_server.py

または実行可能ファイルを使用します:

./dist/voice_mcp

音声認識結果

このサービスは構造化された音声認識結果を提供します。レスポンス形式の例を以下に示します。

元のAPIレスポンス

{
    "jsonrpc": "2.0",
    "result": {
        "message": "input processed successfully",
        "results": "test test test",
        "label_result": "<|en|><|EMO_UNKNOWN|><|Speech|><|woitn|>test test test"
    },
    "id": 1
}

再構築された対応

{
    "jsonrpc": "2.0",
    "result": {
        "message": "input processed successfully",
        "results": "test test test",
        "label_result": {
            "lan": "en",
            "emo": "unknown",
            "type": "speech",
            "speaker": "woitn",
            "text": "test test test"
        }
    },
    "id": 1
}

ラベル結果フィールド

label_resultフィールドには、次の構造化された情報が含まれます。

分野	説明	サンプル値
ラン	言語コード	「en」
エモ	感情状態	"未知"
タイプ	オーディオタイプ	「スピーチ」
スピーカー	スピーカー識別子	「ウォイトン」
文章	認識されたテキストコンテンツ	「テスト、テスト、テスト」

特殊ラベル

サービスは、元の応答内の次の特殊なラベルを認識して処理します。

<|en|> - 言語コード
<|EMO_UNKNOWN|> - 感情状態
<|Speech|> - オーディオの種類
<|woitn|> - 話者識別子

実行可能ファイルのビルド

ビルドスクリプトを実行可能にします。

chmod +x build_exec.sh

stdio モード実行可能ファイルをビルドします。

./build_exec.sh

MCP モード実行可能ファイルをビルドします。

./build_exec.sh mcp

実行可能ファイルは次の場所に作成されます:

stdioモード: dist/voice_stdio
MCPモード: dist/voice_mcp

テスト

テストスクリプトを実行します。

chmod +x test_*.sh
./test_help.sh
./test_voice_file.sh
./test_voice_base64.sh

ライセンス

このプロジェクトは MIT ライセンスに基づいてライセンスされています - 詳細については LICENSE ファイルを参照してください。

This server cannot be installed

-

security - not tested

A

license - permissive license

-

quality - not tested

How are these scores calculated?

Resources

GitHub Repository

Need Help?

Report Issue

Related Servers

Voice Recognition MCP Service