Skip to main content
Glama
phuihock
by phuihock

speech_recognition

Convert audio files to text transcriptions using the Whisper model for accessibility, documentation, or content analysis purposes.

Instructions

Transcribe audio to text using DeepInfra OpenAI-compatible API (Whisper).

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
audio_urlYes

Implementation Reference

  • The speech_recognition tool handler, conditionally defined and registered via @app.tool() decorator. It downloads audio from a provided URL, then uses DeepInfra's OpenAI-compatible Whisper API for transcription.
    if "all" in ENABLED_TOOLS or "speech_recognition" in ENABLED_TOOLS:
        @app.tool()
        async def speech_recognition(audio_url: str) -> str:
            """Transcribe audio to text using DeepInfra OpenAI-compatible API (Whisper)."""
            model = DEFAULT_MODELS["speech_recognition"]
            try:
                async with httpx.AsyncClient(timeout=120.0) as http_client:
                    # Download the audio file
                    audio_response = await http_client.get(audio_url)
                    audio_response.raise_for_status()
                    audio_content = audio_response.content
                
                # Use the OpenAI-compatible Whisper API
                response = await client.audio.transcriptions.create(
                    model=model,
                    file=("audio.mp3", audio_content),
                )
                return response.text
            except Exception as e:
                return f"Error transcribing audio: {type(e).__name__}: {str(e)}"
  • Configuration dictionary DEFAULT_MODELS defining the default model for speech_recognition tool (Whisper large-v3). Used within the handler.
    DEFAULT_MODELS = {
        "generate_image": os.getenv("MODEL_GENERATE_IMAGE", "Bria/Bria-3.2"),
        "text_generation": os.getenv("MODEL_TEXT_GENERATION", "meta-llama/Llama-2-7b-chat-hf"),
        "embeddings": os.getenv("MODEL_EMBEDDINGS", "sentence-transformers/all-MiniLM-L6-v2"),
        "speech_recognition": os.getenv("MODEL_SPEECH_RECOGNITION", "openai/whisper-large-v3"),
        "zero_shot_image_classification": os.getenv("MODEL_ZERO_SHOT_IMAGE_CLASSIFICATION", "openai/gpt-4o-mini"),
        "object_detection": os.getenv("MODEL_OBJECT_DETECTION", "openai/gpt-4o-mini"),
        "image_classification": os.getenv("MODEL_IMAGE_CLASSIFICATION", "openai/gpt-4o-mini"),
        "text_classification": os.getenv("MODEL_TEXT_CLASSIFICATION", "microsoft/DialoGPT-medium"),
        "token_classification": os.getenv("MODEL_TOKEN_CLASSIFICATION", "microsoft/DialoGPT-medium"),
        "fill_mask": os.getenv("MODEL_FILL_MASK", "microsoft/DialoGPT-medium"),
    }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/phuihock/mcp-deeinfra'

If you have feedback or need assistance with the MCP directory API, please join our Discord server