transcribe-audio
Transcribes audio from URLs, base64, or local files using Whisper, with support for large files via chunked upload and options for language, timestamps, and async processing.
Instructions
Transcribes audio via Whisper. Preferred: audio_url (most token-efficient; server fetches bytes). audio_base64 is for small clips only (<= ~60KB raw per call). audio_path only works when the MCP host shares a filesystem with the caller (often false on Claude.ai / Claude Code). For larger payloads in sandboxed environments, use transcribe_upload_start / transcribe_upload_append / transcribe_upload_finalize. Server re-encodes to Opus 16kHz mono 16kbps before Whisper unless skip_compression=true. Long audio (>5min) or async=true returns a job_id; poll transcribe_get_job.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| audio_path | No | Absolute path to a local audio file on the MCP host (often unusable from sandboxed clients). | |
| audio_base64 | No | Base64-encoded audio payload (single-call max ~60KB raw; use chunked upload for larger). | |
| audio_resource_uri | No | Audio resource URI, supported schemes: file:// and data:...;base64,... | |
| audio_url | No | HTTP(S) URL the server will fetch (requires TRANSCRIPT_MCP_URL_ALLOWLIST). | |
| filename | No | Optional filename hint (used when magic-byte detection is inconclusive). | |
| skip_compression | No | If true, skip Opus 16kbps recompression (caller already optimized). Default: false | |
| engine | No | Transcription engine preference. 'auto' uses OpenAI first and falls back to local whisper when available. | |
| language | No | Language code for transcription (e.g., 'en', 'es', 'fr'). Default: auto-detect | |
| include_timestamps | No | When as_text=true, include [MM:SS] timestamps in the plain text output. Default: true | |
| as_text | No | If true, return only the joined transcript string. If false, return structured JSON. Default: false | |
| async | No | If true, always enqueue an async job (returns job_id). Default: false |