Speech recognition technology and systems

Search for:

Speech recognition technology and systems

View all MCP Servers

Why this server?
This server directly provides 'voice recognition' and text extraction capabilities, which is synonymous with speech recognition.
Voice Recognition MCP Service
Speech Processing Audio Processing
yangsenessa
A
license
-
quality
D
maintenance
Provides voice recognition and text extraction capabilities with support for both stdio and MCP modes, processing audio files or base64 encoded data and returning structured results with language, emotion, and speaker information.
Last updated 2025-06-17
MIT
Why this server?
This server explicitly enables 'speech-to-text transcription', which is the core function of speech recognition.
Deepgram MCP Server
Speech Processing Audio Processing Multimedia Processing
reddheeraj
A
license
-
quality
D
maintenance
Enables speech-to-text transcription, text-to-speech synthesis, and audio analysis using Deepgram's AI models. Supports features like speaker diarization, sentiment analysis, language detection, and various audio processing capabilities.
Last updated 2025-09-16
2
MIT
Why this server?
This server supports 'multiple speech recognition providers' and 'automatic speech-to-text transcription', directly matching the search.
MCP Video Extraction Plus
Speech Processing Multimedia Processing Audio Processing
takereshui
A
license
-
quality
D
maintenance
Enables video text extraction using multiple speech recognition providers including local Whisper, JianYing/CapCut, and Bilibili Cut services. Supports video downloading, audio extraction, and automatic speech-to-text transcription with configurable providers.
Last updated 2025-11-22
7
MIT
Why this server?
This server provides 'high-performance speech recognition', making it a direct fit for the user's query.
LocalVoiceMode
Speech Processing Text-to-Speech Audio Processing
DevMan57
F
license
-
quality
-
maintenance
A local voice interface providing high-performance speech recognition and natural text-to-speech with voice cloning capabilities. It enables AI assistants to speak, listen, and engage in character-based voice conversations through integrated MCP tools.
Last updated 2026-01-25
Why this server?
This server is a 'powerful speech-to-text MCP server' that supports various recognition engines, directly addressing speech recognition.
Voice to Text MCP Server
gongjiaben
F
license
-
quality
D
maintenance
A powerful speech-to-text MCP server that supports multiple audio formats and recognition engines including remote APIs (Bailian, OpenAI Whisper, iFLYTEK), Google Speech Recognition, and CMU Sphinx.
Last updated 2025-07-12
1
Why this server?
This system enables natural interaction through integrated 'speech recognition' capabilities.
MCP Agent Platform
Speech Processing Image & Video Processing Autonomous Agents
rolenet
A
license
-
quality
D
maintenance
A multi-agent human-computer interaction system that enables natural interaction through integrated visual recognition, speech recognition, and speech synthesis capabilities.
Last updated 2025-05-21
21
Apache 2.0
Why this server?
This server enables hands-free voice conversations using 'real-time speech recognition'.
Voice Loop MCP
Speech Processing Autonomous Agents Command Line
theonlypal
A
license
-
quality
D
maintenance
Enables hands-free voice conversations with Claude using real-time speech recognition and text-to-speech on macOS. Creates a self-sustaining conversation loop where Claude can autonomously listen, respond, and continue the interaction without keyboard input.
Last updated 2025-12-17
MIT
Why this server?
This server is a local voice input tool that converts 'speech to text in real-time', which is speech recognition.
vocotype
Speech Processing App Automation Command Line
233stone
F
license
-
quality
B
maintenance
VocoType 是一款运行在本地端侧的隐私安全语音输入工具，通过快捷键即可将语音实时转换为文字并自动输入到当前应用。支持语音转文字MCP、AI 优化文本、自定义替换词典、录音视频转文字等功能，让语音输入更高效、更安全。
Last updated 2026-06-24
828
Why this server?
This server enables voice interaction through local 'speech-to-text' (Whisper), a direct match for speech recognition.
Voice MCP
Speech Processing Text-to-Speech Multimedia Processing
jochiang
F
license
-
quality
D
maintenance
Enables voice interaction with Claude Code through local speech-to-text (Whisper) and text-to-speech (Supertonic), allowing verbal input/output without external API calls.
Last updated 2025-12-29
1