transcribe_audio
Transcribe audio to text with timestamps. Supports 13 languages, handles accents and background noise. Pay per request with Bitcoin Lightning—no signup needed.
Instructions
Transcribe audio to text with timestamps. Uses Mistral Transcription — high-accuracy speech recognition that handles accents, background noise, and overlapping speakers. 13 languages: en, zh, hi, es, ar, fr, pt, ru, de, ja, ko, it, nl. Up to 512 MB / 3 hours per file. Async — returns requestId, poll with check_job_status(jobType='transcription'), then get_job_result. 10 sats/min. Privacy: audio and transcripts are ephemeral — processed, returned, and discarded. Never persisted. Pay per request with Bitcoin Lightning — no API key or signup needed. Requires create_payment with toolName='transcribe_audio'.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| paymentId | Yes | Valid payment ID (must be paid) | |
| audioBase64 | Yes | Base64 encoded audio file | |
| language | No | Language code (e.g., 'en', 'es') |
Implementation Reference
- index.js:14-45 (registration)The tool is registered as "transcription" (not "transcribe_audio") in the TOOLS array. The tool named "transcribe_audio" does not exist in this codebase; the closest is "transcription" listed at line 22.
const TOOLS = [ "image", "video", "video_from_image", "text", "vision", "music", "tts", "transcription", "3d", "ocr", "file_convert", "email", "sms", "call", "voice_clone", "image_edit", "pdf_merge", "epub_to_audiobook", "convert_html_to_pdf", "translate_text", "extract_receipt", "ai_call", "remove_background", "upscale_image", "restore_face", "detect_nsfw", "detect_objects", "remove_object", "colorize_image", "deblur_image", ];