Why this server?
This server directly provides 'voice recognition' and text extraction capabilities, which is synonymous with speech recognition.
-securityAlicense-qualityProvides voice recognition and text extraction capabilities with support for both stdio and MCP modes, processing audio files or base64 encoded data and returning structured results with language, emotion, and speaker information.Last updatedMITWhy this server?
This server explicitly enables 'speech-to-text transcription', which is the core function of speech recognition.
-securityAlicense-qualityEnables speech-to-text transcription, text-to-speech synthesis, and audio analysis using Deepgram's AI models. Supports features like speaker diarization, sentiment analysis, language detection, and various audio processing capabilities.Last updated2MITWhy this server?
This server supports 'multiple speech recognition providers' and 'automatic speech-to-text transcription', directly matching the search.
-securityFlicense-qualityEnables video text extraction using multiple speech recognition providers including local Whisper, JianYing/CapCut, and Bilibili Cut services. Supports video downloading, audio extraction, and automatic speech-to-text transcription with configurable providers.Last updated7Why this server?
This server provides 'high-performance speech recognition', making it a direct fit for the user's query.
-securityFlicense-qualityA local voice interface providing high-performance speech recognition and natural text-to-speech with voice cloning capabilities. It enables AI assistants to speak, listen, and engage in character-based voice conversations through integrated MCP tools.Last updatedWhy this server?
This server is a 'powerful speech-to-text MCP server' that supports various recognition engines, directly addressing speech recognition.
-securityFlicense-qualityA powerful speech-to-text MCP server that supports multiple audio formats and recognition engines including remote APIs (Bailian, OpenAI Whisper, iFLYTEK), Google Speech Recognition, and CMU Sphinx.Last updated1Why this server?
This system enables natural interaction through integrated 'speech recognition' capabilities.
-securityAlicense-qualityA multi-agent human-computer interaction system that enables natural interaction through integrated visual recognition, speech recognition, and speech synthesis capabilities.Last updated22Apache 2.0Why this server?
This server enables hands-free voice conversations using 'real-time speech recognition'.
-securityFlicense-qualityEnables hands-free voice conversations with Claude using real-time speech recognition and text-to-speech on macOS. Creates a self-sustaining conversation loop where Claude can autonomously listen, respond, and continue the interaction without keyboard input.Last updatedWhy this server?
This server is a local voice input tool that converts 'speech to text in real-time', which is speech recognition.
Why this server?
This server enables voice interaction through local 'speech-to-text' (Whisper), a direct match for speech recognition.