speech_recognition
Convert audio files to text transcriptions using the Whisper model for accessibility, documentation, or content analysis purposes.
Instructions
Transcribe audio to text using DeepInfra OpenAI-compatible API (Whisper).
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| audio_url | Yes |
Implementation Reference
- src/mcp_deepinfra/server.py:98-118 (handler)The speech_recognition tool handler, conditionally defined and registered via @app.tool() decorator. It downloads audio from a provided URL, then uses DeepInfra's OpenAI-compatible Whisper API for transcription.if "all" in ENABLED_TOOLS or "speech_recognition" in ENABLED_TOOLS: @app.tool() async def speech_recognition(audio_url: str) -> str: """Transcribe audio to text using DeepInfra OpenAI-compatible API (Whisper).""" model = DEFAULT_MODELS["speech_recognition"] try: async with httpx.AsyncClient(timeout=120.0) as http_client: # Download the audio file audio_response = await http_client.get(audio_url) audio_response.raise_for_status() audio_content = audio_response.content # Use the OpenAI-compatible Whisper API response = await client.audio.transcriptions.create( model=model, file=("audio.mp3", audio_content), ) return response.text except Exception as e: return f"Error transcribing audio: {type(e).__name__}: {str(e)}"
- src/mcp_deepinfra/server.py:31-42 (helper)Configuration dictionary DEFAULT_MODELS defining the default model for speech_recognition tool (Whisper large-v3). Used within the handler.DEFAULT_MODELS = { "generate_image": os.getenv("MODEL_GENERATE_IMAGE", "Bria/Bria-3.2"), "text_generation": os.getenv("MODEL_TEXT_GENERATION", "meta-llama/Llama-2-7b-chat-hf"), "embeddings": os.getenv("MODEL_EMBEDDINGS", "sentence-transformers/all-MiniLM-L6-v2"), "speech_recognition": os.getenv("MODEL_SPEECH_RECOGNITION", "openai/whisper-large-v3"), "zero_shot_image_classification": os.getenv("MODEL_ZERO_SHOT_IMAGE_CLASSIFICATION", "openai/gpt-4o-mini"), "object_detection": os.getenv("MODEL_OBJECT_DETECTION", "openai/gpt-4o-mini"), "image_classification": os.getenv("MODEL_IMAGE_CLASSIFICATION", "openai/gpt-4o-mini"), "text_classification": os.getenv("MODEL_TEXT_CLASSIFICATION", "microsoft/DialoGPT-medium"), "token_classification": os.getenv("MODEL_TOKEN_CLASSIFICATION", "microsoft/DialoGPT-medium"), "fill_mask": os.getenv("MODEL_FILL_MASK", "microsoft/DialoGPT-medium"), }