speech_recognition
Convert audio files to text transcriptions using AI speech recognition. Process spoken content from audio URLs into readable text format for documentation, analysis, or accessibility purposes.
Instructions
Transcribe audio to text using DeepInfra OpenAI-compatible API (Whisper).
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| audio_url | Yes | ||
| model | No |
Implementation Reference
- src/mcp_deepinfra/server.py:98-117 (handler)Handler function for speech_recognition tool, registered via @app.tool() decorator. Downloads audio from provided URL, transcribes it to text using the Whisper model hosted on DeepInfra via OpenAI-compatible API.if "all" in ENABLED_TOOLS or "speech_recognition" in ENABLED_TOOLS: @app.tool() async def speech_recognition(audio_url: str) -> str: """Transcribe audio to text using DeepInfra OpenAI-compatible API (Whisper).""" model = DEFAULT_MODELS["speech_recognition"] try: async with httpx.AsyncClient(timeout=120.0) as http_client: # Download the audio file audio_response = await http_client.get(audio_url) audio_response.raise_for_status() audio_content = audio_response.content # Use the OpenAI-compatible Whisper API response = await client.audio.transcriptions.create( model=model, file=("audio.mp3", audio_content), ) return response.text except Exception as e: return f"Error transcribing audio: {type(e).__name__}: {str(e)}"
- src/mcp_deepinfra/server.py:31-42 (helper)Configuration dictionary defining default models for all tools, including 'speech_recognition' which defaults to 'openai/whisper-large-v3'.DEFAULT_MODELS = { "generate_image": os.getenv("MODEL_GENERATE_IMAGE", "Bria/Bria-3.2"), "text_generation": os.getenv("MODEL_TEXT_GENERATION", "meta-llama/Llama-2-7b-chat-hf"), "embeddings": os.getenv("MODEL_EMBEDDINGS", "sentence-transformers/all-MiniLM-L6-v2"), "speech_recognition": os.getenv("MODEL_SPEECH_RECOGNITION", "openai/whisper-large-v3"), "zero_shot_image_classification": os.getenv("MODEL_ZERO_SHOT_IMAGE_CLASSIFICATION", "openai/gpt-4o-mini"), "object_detection": os.getenv("MODEL_OBJECT_DETECTION", "openai/gpt-4o-mini"), "image_classification": os.getenv("MODEL_IMAGE_CLASSIFICATION", "openai/gpt-4o-mini"), "text_classification": os.getenv("MODEL_TEXT_CLASSIFICATION", "microsoft/DialoGPT-medium"), "token_classification": os.getenv("MODEL_TOKEN_CLASSIFICATION", "microsoft/DialoGPT-medium"), "fill_mask": os.getenv("MODEL_FILL_MASK", "microsoft/DialoGPT-medium"), }