Skip to main content
Glama

speech_recognition

Convert audio files to text transcriptions using AI speech recognition. Process spoken content from audio URLs into readable text format for documentation, analysis, or accessibility purposes.

Instructions

Transcribe audio to text using DeepInfra OpenAI-compatible API (Whisper).

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
audio_urlYes
modelNo

Implementation Reference

  • Handler function for speech_recognition tool, registered via @app.tool() decorator. Downloads audio from provided URL, transcribes it to text using the Whisper model hosted on DeepInfra via OpenAI-compatible API.
    if "all" in ENABLED_TOOLS or "speech_recognition" in ENABLED_TOOLS: @app.tool() async def speech_recognition(audio_url: str) -> str: """Transcribe audio to text using DeepInfra OpenAI-compatible API (Whisper).""" model = DEFAULT_MODELS["speech_recognition"] try: async with httpx.AsyncClient(timeout=120.0) as http_client: # Download the audio file audio_response = await http_client.get(audio_url) audio_response.raise_for_status() audio_content = audio_response.content # Use the OpenAI-compatible Whisper API response = await client.audio.transcriptions.create( model=model, file=("audio.mp3", audio_content), ) return response.text except Exception as e: return f"Error transcribing audio: {type(e).__name__}: {str(e)}"
  • Configuration dictionary defining default models for all tools, including 'speech_recognition' which defaults to 'openai/whisper-large-v3'.
    DEFAULT_MODELS = { "generate_image": os.getenv("MODEL_GENERATE_IMAGE", "Bria/Bria-3.2"), "text_generation": os.getenv("MODEL_TEXT_GENERATION", "meta-llama/Llama-2-7b-chat-hf"), "embeddings": os.getenv("MODEL_EMBEDDINGS", "sentence-transformers/all-MiniLM-L6-v2"), "speech_recognition": os.getenv("MODEL_SPEECH_RECOGNITION", "openai/whisper-large-v3"), "zero_shot_image_classification": os.getenv("MODEL_ZERO_SHOT_IMAGE_CLASSIFICATION", "openai/gpt-4o-mini"), "object_detection": os.getenv("MODEL_OBJECT_DETECTION", "openai/gpt-4o-mini"), "image_classification": os.getenv("MODEL_IMAGE_CLASSIFICATION", "openai/gpt-4o-mini"), "text_classification": os.getenv("MODEL_TEXT_CLASSIFICATION", "microsoft/DialoGPT-medium"), "token_classification": os.getenv("MODEL_TOKEN_CLASSIFICATION", "microsoft/DialoGPT-medium"), "fill_mask": os.getenv("MODEL_FILL_MASK", "microsoft/DialoGPT-medium"), }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/phuihock/mcp-deeinfra'

If you have feedback or need assistance with the MCP directory API, please join our Discord server