음성 인식 MCP 서비스

이 서비스는 stdio와 MCP 모드를 통해 음성 인식 및 텍스트 추출 기능을 제공합니다.

특징

파일에서 음성 인식
Base64로 인코딩된 데이터로부터 음성 인식
텍스트 추출
stdio 및 MCP 모드 모두 지원
구조화된 음성 인식 결과

프로젝트 구조

voice_service.py - 핵심 서비스 구현
stdio_server.py - stdio 모드 진입점
mcp_server.py - MCP 모드 진입점
build.py - 실행 파일을 위한 빌드 스크립트
build_exec.sh - 빌드 실행 스크립트
test_*.sh - 다양한 기능에 대한 테스트 스크립트

설치

저장소를 복제합니다.

지엑스피1

종속성 설치:

pip install -r requirements.txt

.env 에 환경 변수를 설정합니다.

API_URL=your_api_url
API_KEY=your_api_key

용법

stdio 모드

서비스를 실행합니다:

python stdio_server.py

stdin을 통해 JSON-RPC 요청을 보냅니다.

{
    "jsonrpc": "2.0",
    "method": "help",
    "params": {},
    "id": 1
}

또는 실행 파일을 사용하세요.

./dist/voice_stdio

MCP 모드

서비스를 실행합니다:

python mcp_server.py

또는 실행 파일을 사용하세요.

./dist/voice_mcp

음성 인식 결과

이 서비스는 구조화된 음성 인식 결과를 제공합니다. 응답 형식의 예는 다음과 같습니다.

원래 API 응답

{
    "jsonrpc": "2.0",
    "result": {
        "message": "input processed successfully",
        "results": "test test test",
        "label_result": "<|en|><|EMO_UNKNOWN|><|Speech|><|woitn|>test test test"
    },
    "id": 1
}

재구성된 대응

{
    "jsonrpc": "2.0",
    "result": {
        "message": "input processed successfully",
        "results": "test test test",
        "label_result": {
            "lan": "en",
            "emo": "unknown",
            "type": "speech",
            "speaker": "woitn",
            "text": "test test test"
        }
    },
    "id": 1
}

레이블 결과 필드

label_result 필드에는 다음과 같은 구조화된 정보가 포함되어 있습니다.

필드	설명	예시 값
란	언어 코드	"en"
에모	감정 상태	"알려지지 않은"
유형	오디오 유형	"연설"
스피커	스피커 식별자	"woitn"
텍스트	인식된 텍스트 콘텐츠	"테스트 테스트 테스트"

특수 라벨

이 서비스는 원래 응답에서 다음과 같은 특수 레이블을 인식하고 처리합니다.

<|en|> - 언어 코드
<|EMO_UNKNOWN|> - 감정 상태
<|Speech|> - 오디오 유형
<|woitn|> - 스피커 식별자

실행 파일 구축

빌드 스크립트를 실행 가능하게 만듭니다.

chmod +x build_exec.sh

stdio 모드 실행 파일을 빌드합니다.

./build_exec.sh

MCP 모드 실행 파일 빌드:

./build_exec.sh mcp

실행 파일은 다음 위치에 생성됩니다.

stdio 모드: dist/voice_stdio
MCP 모드: dist/voice_mcp

테스트

테스트 스크립트를 실행합니다.

chmod +x test_*.sh
./test_help.sh
./test_voice_file.sh
./test_voice_base64.sh

특허

이 프로젝트는 MIT 라이선스에 따라 라이선스가 부여되었습니다. 자세한 내용은 라이선스 파일을 참조하세요.

This server cannot be installed

security - not tested

license - permissive license

quality - not tested

How are these scores calculated?

remote-capable server

The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.

stdio와 MCP 모드를 모두 지원하고, 오디오 파일이나 base64로 인코딩된 데이터를 처리하고 언어, 감정, 화자 정보가 포함된 구조화된 결과를 반환하며, 음성 인식 및 텍스트 추출 기능을 제공합니다.

Related MCP Servers

MCP Access Server
shin-t-o
A
security
A
license
A
quality
Enables text extraction from web pages and PDFs, and execution of predefined commands, enhancing content processing and automation capabilities.
Last updated -
3
TypeScript
MIT License
Kokoro TTS MCP Server
giannisanni
-
security
F
license
-
quality
Provides text-to-speech capabilities through the Model Context Protocol, allowing applications to easily integrate speech synthesis with customizable voices, adjustable speech speed, and cross-platform audio playback support.
Last updated -
2
Python
Analytical MCP Server
quanticsoul4772
-
security
A
license
-
quality
Provides advanced analytical, research, and natural language processing capabilities through a Model Context Protocol server, enabling dataset analysis, decision analysis, and enhanced NLP features like entity recognition and fact extraction.
Last updated -
2
TypeScript
MIT License
Resemble AI Voice Generation MCP Server
obaid
-
security
F
license
-
quality
Integrates with Claude and Cursor using the Model Context Protocol to generate voice audio from text using Resemble AI's voices.
Last updated -
Python

View all related MCP servers

Appeared in Searches

A service to convert text to ready-to-use audio with download, player, or embed options

Voice Recognition MCP Service