Skip to main content
Glama

speech_recognition

Convert audio files to text transcriptions using the Whisper model for accessibility, documentation, or content analysis purposes.

Instructions

Transcribe audio to text using DeepInfra OpenAI-compatible API (Whisper).

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
audio_urlYes

Implementation Reference

  • The speech_recognition tool handler, conditionally defined and registered via @app.tool() decorator. It downloads audio from a provided URL, then uses DeepInfra's OpenAI-compatible Whisper API for transcription.
    if "all" in ENABLED_TOOLS or "speech_recognition" in ENABLED_TOOLS: @app.tool() async def speech_recognition(audio_url: str) -> str: """Transcribe audio to text using DeepInfra OpenAI-compatible API (Whisper).""" model = DEFAULT_MODELS["speech_recognition"] try: async with httpx.AsyncClient(timeout=120.0) as http_client: # Download the audio file audio_response = await http_client.get(audio_url) audio_response.raise_for_status() audio_content = audio_response.content # Use the OpenAI-compatible Whisper API response = await client.audio.transcriptions.create( model=model, file=("audio.mp3", audio_content), ) return response.text except Exception as e: return f"Error transcribing audio: {type(e).__name__}: {str(e)}"
  • Configuration dictionary DEFAULT_MODELS defining the default model for speech_recognition tool (Whisper large-v3). Used within the handler.
    DEFAULT_MODELS = { "generate_image": os.getenv("MODEL_GENERATE_IMAGE", "Bria/Bria-3.2"), "text_generation": os.getenv("MODEL_TEXT_GENERATION", "meta-llama/Llama-2-7b-chat-hf"), "embeddings": os.getenv("MODEL_EMBEDDINGS", "sentence-transformers/all-MiniLM-L6-v2"), "speech_recognition": os.getenv("MODEL_SPEECH_RECOGNITION", "openai/whisper-large-v3"), "zero_shot_image_classification": os.getenv("MODEL_ZERO_SHOT_IMAGE_CLASSIFICATION", "openai/gpt-4o-mini"), "object_detection": os.getenv("MODEL_OBJECT_DETECTION", "openai/gpt-4o-mini"), "image_classification": os.getenv("MODEL_IMAGE_CLASSIFICATION", "openai/gpt-4o-mini"), "text_classification": os.getenv("MODEL_TEXT_CLASSIFICATION", "microsoft/DialoGPT-medium"), "token_classification": os.getenv("MODEL_TOKEN_CLASSIFICATION", "microsoft/DialoGPT-medium"), "fill_mask": os.getenv("MODEL_FILL_MASK", "microsoft/DialoGPT-medium"), }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/phuihock/mcp-deeinfra'

If you have feedback or need assistance with the MCP directory API, please join our Discord server