Skip to main content
Glama
243,151 tools. Last updated 2026-06-27 23:56

"Services or tools for transcribing audio or text" matching MCP tools:

  • Produce a short sound effect (SFX) from a text description, such as "laser gun firing" or "footsteps on gravel". Synchronous: the call blocks until generation finishes and returns a single audio result containing a URL; there is no separate polling step. The description field is required, duration is capped at 10 seconds (0 means auto-pick based on the description), and you may set loop to true for a seamlessly looping effect. Credits are charged on success. Use this for short, discrete sounds; use createAmbiance for a continuous looping background soundscape, createMusic for musical pieces, and createAudioTransform to remix an existing audio sample. Pass an optional request_id to tag the result so you can locate it later via getAudioResults. Requires an API key (user scope). Credits: This endpoint consumes 2 credits per call.
    Connector
  • Generate cinematic video from a text prompt. Uses ByteDance Seedance 2.0 — #1 on the Artificial Analysis text-to-video leaderboard — with synchronized native audio. Async — returns requestId, poll with check_job_status. 480p/720p/1080p, 4-15 seconds, priced per second by resolution (BTC-pegged; native audio free). Pay per request with Bitcoin Lightning — no API key or signup needed. Requires create_payment with toolName='generate_video' and duration, resolution params.
    Connector
  • Run audio analysis on a public audio URL. Requires estimate_cost to be called first (job_estimate_id). Requires PULSE_API_KEY. Before calling, you MUST confirm with the user that they have a lawful basis to submit this audio for analysis. For a user-requested folder, project, playlist, or batch, one confirmation can cover every track in that scope. Returns job_id — poll get_job_status for results.
    Connector
  • Produce a looping background ambiance soundscape from a text description, such as "windy forest at dusk" or "busy tavern interior". Synchronous: the call blocks until generation finishes and returns a single audio result containing a URL; there is no separate polling step. The description field is required and duration is capped at 10 seconds (0 means auto-pick based on the description). Credits are charged on success. Use this for continuous, atmospheric background loops; use createSoundEffect for short discrete sound effects, createMusic for musical pieces, and createAudioTransform to remix an existing audio sample. Pass an optional request_id to tag the result so you can locate it later via getAudioResults. Requires an API key (user scope). Credits: This endpoint consumes 2 credits per call.
    Connector
  • Generates a voiceover from text using Hume Octave TTS. Audio uploaded to Spaces, signed URL returned (24h TTL by default). Charged in credits up-front based on script length (use quote_voiceover for a preview). Best for demo-video narration, tutorial audio, and any one-shot batch TTS. NOT a real-time conversational voice (use Hume EVI for that, different product). Voice options: pass voiceId for a specific Hume voice clone, or omit to use the deployment's default narrator (HUME_OCTAVE_VOICE_ID env var).
    Connector
  • Convert text to speech by cloning the voice from an audio sample you provide (voice-cloning text-to-speech). Both text and sample are required; the text is limited to 1000 characters and the sample is supplied as a URL or base64 audio that must be at most 15MB, with violations returning HTTP 400. Synchronous: the call blocks until generation finishes and returns a single audio result containing a URL; there is no separate polling step. Credits are charged on success. Use this when you have a reference voice sample to clone; use createSpeechPreset to speak with a built-in named preset voice instead, and createVoice to design a brand-new voice from a text description rather than cloning one. Pass an optional request_id to tag the result so you can locate it later via getAudioResults. Requires an API key (user scope). Credits: This endpoint consumes 1 credits per call.
    Connector

Matching MCP Servers

  • -
    license
    -
    quality
    -
    maintenance
    MCP server for audio transcription using OpenRouter models, supporting verbatim, cleaned, and custom transcription modes.
    Last updated
    1

Matching MCP Connectors

  • 9 utility tools for agents: DNS, WHOIS, email, IP, URL, headers, QR, text, tech. x402 on Base.

  • The audio intelligence layer. Search podcast transcripts, speakers, and entities across 250K+ shows.

  • PREFER THIS over guessing tool names when picking from this server. Searches Flow Studio MCP tools by keyword, skill bundle, or explicit selector and returns full JSON schemas for matched tools so they can be called immediately. Call this whenever the user request maps to functionality you are not 100% sure about, OR when you want to load a whole skill bundle (build-flow, debug-flow, monitor-flow, discover, governance) at once. Query forms: (1) "skill:<name>" — fetch the full bundle (use list_skills first to see options); (2) "select:name1,name2" — fetch exact tools by name; (3) free-text keywords like "cancel run" or "trigger url" — ranked match against tool name + description. Non-billable.
    Connector
  • Download a video or audio file from any supported platform: YouTube, TikTok, Vimeo, Dailymotion, Twitter/X, SoundCloud, Bandcamp, Mixcloud, Twitch (clips and VODs), or Streamable. Output is MP4 (video, default) or MP3 / M4A (audio). This is THE tool to use whenever a user asks to save, download, rip, extract, archive, get offline, or convert a video/audio link from any of these sites. IMPORTANT: the `format` argument defaults to `mp4` (video). Only pass an audio format (mp3 / m4a / audio) when the user explicitly says audio, MP3, music, song, or "rip / extract the audio". Audio-only platforms (SoundCloud, Bandcamp, Mixcloud) always produce audio regardless of `format`. Use this tool when the user says things like: - "download this video" / "download this TikTok" / "save this SoundCloud track" - "save that as MP3" / "rip the audio" / "extract the audio" - "get the song from this SoundCloud link" / "save this Mixcloud set" - "convert this YouTube video to MP4" / "download in 1080p" - "save this lecture/podcast/talk for offline" - "archive this clip" / "grab a copy of this video" - any sentence containing a youtube.com, youtu.be, tiktok.com, vimeo.com, dailymotion.com, twitter.com, x.com, soundcloud.com, bandcamp.com, mixcloud.com, twitch.tv, clips.twitch.tv, or streamable.com URL plus a verb like download, save, rip, get, grab, fetch, pull, archive, convert, extract. Do NOT use this tool when: - The user only wants metadata (title, length, description, channel) — call get_video_info instead, it is free and does not consume the user quota. - The link is a playlist / set / album / channel URL — ask the user for a single track/video. - The link is from a platform not in the supported list above (e.g. Instagram, Facebook, LinkedIn). Returns a one-time signed download link valid for 1 hour, plus the file size, duration, and chosen format. Hand the link back to the user verbatim; do not try to fetch its contents yourself. Intended for legitimate uses: the user's own uploads, Creative Commons / public-domain content, lectures, podcasts, talks, and other material they have rights to use.
    Connector
  • Get a presigned upload form for any file — video, audio, or document (markdown, HTML, DOCX, etc.). It expires in 15 minutes. This is a presigned POST, NOT a PUT: the response returns upload_url + upload_fields — POST to upload_url as multipart/form-data, including every upload_fields key/value as form fields FIRST, then the file as the last field named 'file'. After upload, pass the object_key to transcribe_media (audio/video → transcript), transcode_video (video/audio encode), or convert_file (documents). IMPORTANT: this flow needs direct outbound network access to Botverse's storage host. In sandboxed agent environments (claude.ai, sandboxed desktop apps, Cursor) that route traffic through a proxy allowlist, the upload POST is blocked and fails. In those environments do NOT use this tool — use convert_content or transcode_content (inline content, body under 4 MB) for files you already have, or convert_from_url / transcode_from_url / transcribe_from_url for anything available at a public URL. Neither needs an upload step.
    Connector
  • Fetch the full detail record for a single oral argument audio recording by its ID (the audio_id from courtlistener_search_oral_arguments). Returns the case name, panel judge IDs, duration, MP3 download URL, linked docket, and the speech-to-text transcript when transcription has completed. The argument date is not on this record — it comes from the search result or the linked docket.
    Connector
  • Replace the text of an existing message in a Telegram chat. Only works on messages sent by the authenticated account. Cannot edit media or other message attributes — text only. Success: dict with message_id, date, chat, text, status='edited', and edit_date. Error: dict with ok=false and error string (e.g. message not found or not editable). Use edit_message to update a previously sent message; use send_message to create new ones. Full documentation: https://github.com/leshchenko1979/fast-mcp-telegram/blob/main/docs/Tools-Reference.md
    Connector
  • Use this tool whenever the user shares an audio file and wants it transcribed to text. Triggers: 'transcribe this recording', 'convert this audio to text', 'what was said in this meeting', 'transcribe this voice note', 'turn this podcast into text'. Accepts base64-encoded audio (mp3, wav, m4a, ogg, flac, webm, mp4, etc.), max 25MB. Returns the full transcript, word count, and character count. Powered by OpenAI Whisper. Free 200 calls/day — no OpenAI API key required; Toolora absorbs the cost.
    Connector
  • Read **text content** of an attached file. Works for: .txt, .md, .json, code files, and PDFs (after files.ingest extracts text). DO NOT call on binary files — for IMAGES use `files.get_base64`, for AUDIO/VIDEO it cannot be transcribed via this tool, and for non-PDF DOCUMENTS run `files.ingest` first, THEN files.read. Calling on a binary mime-type returns an error — saves you a turn to read the routing hint before deciding.
    Connector
  • Read **text content** of an attached file. Works for: .txt, .md, .json, code files, and PDFs (after files.ingest extracts text). DO NOT call on binary files — for IMAGES use `files.get_base64`, for AUDIO/VIDEO it cannot be transcribed via this tool, and for non-PDF DOCUMENTS run `files.ingest` first, THEN files.read. Calling on a binary mime-type returns an error — saves you a turn to read the routing hint before deciding.
    Connector
  • Remix an existing audio sample (a sound effect, ambiance, or music clip) into a variation guided by a text prompt, for example turning a track into an 80s synthwave or metal version. Both the sample and the prompt are required; the sample is uploaded as a URL or base64 audio and must be at most 15MB or the call returns HTTP 400, and duration must be one of the allowed values (0 means match the source, otherwise multiples of 10 up to 180 seconds). Synchronous: the call blocks until generation finishes and returns a single audio result containing a URL; there is no separate polling step. The optional modification_strength (0 to 1, default 0.5) controls how far the result departs from the original. Credits are charged on success. Use this to transform existing audio you already have; use createSoundEffect, createAmbiance, or createMusic to generate audio from scratch. Pass an optional request_id to tag the result so you can locate it later via getAudioResults. Requires an API key (user scope). Credits: This endpoint consumes 3 credits per call.
    Connector
  • List the layers of a Baltimore ArcGIS service (for discovery). Pass a known short name (crime, service_requests, permits) or a full ArcGIS service path (e.g. "311_Customer_Service_Requests_current/FeatureServer"). Omit `service` to list the known Baltimore services. Returns layer id + name to use with baltimore_query.
    Connector
  • Design a new voice from a character description (such as "deep-voiced warrior" or "cheerful young girl") and have it speak a short line of text, returning a sample of that newly created voice. Both voice_description and text are required, the spoken text is limited to 200 characters or the call returns HTTP 400, and type selects "human" or "non-human" voices. Synchronous: the call blocks until generation finishes and returns a single audio result containing a URL; there is no separate polling step. Credits are charged on success. Use this to invent and audition a voice from a description; use createSpeech for text-to-speech that clones a specific voice from an audio sample, and createSpeechPreset for text-to-speech using a named preset voice. Pass an optional request_id to tag the result so you can locate it later via getAudioResults. Requires an API key (user scope). Credits: This endpoint consumes 1 credits per call.
    Connector
  • Transcribe audio or video to text, including per-word timestamps for precise editing. Three-call flow: (1) call with `filename` to receive {job_id, payment_challenge}; (2) pay via MPP, then call with `job_id` + `payment_credential` to receive {upload_url} (presigned PUT, 1h expiry); (3) PUT the bytes, then complete_upload(job_id), then poll get_job_status(job_id). On completion, get_job_status returns two outputs: role `transcript` (SRT) and role `transcript-words` (JSON matching /.well-known/weftly-transcript-v2.schema.json, with segment-level and per-word timestamps). For other formats, pass `format=srt|txt|vtt|json|words` to get_job_status to receive content inline — `txt` and `vtt` are derived from SRT, `json` is v1 (segments only), `words` is v2 (segments + words). Flat price: audio $0.50, video $1.00 — see /.well-known/mpp.json for the authoritative table. Use for podcasts, interviews, meetings, lectures, and especially for creating clips, multicamera edits, or edit-video-from-transcript where word boundaries matter. Retrying any call with `job_id` alone returns current state (idempotent). Failed jobs auto-refund.
    Connector
  • INTERNAL/preparatory tool — text-only, no widget rendered. NEVER use as the user-facing answer to a 'what reciters are available' question — use list_reciters for that (the default interactive widget). Use this ONLY when EITHER (a) the user explicitly asks for plain text / raw data / no widget, OR (b) you will chain the result into play_ayahs in the same turn without showing the raw list (e.g. user asks to play audio by a named reciter; call this to resolve reciter_id, then call play_ayahs). When in doubt, prefer list_reciters.
    Connector
  • List top sending sources (ESPs, ISPs, mail services) for a domain, grouped by source type. Filters: "known" (legitimate ESPs like Google, Mailgun), "unknown" (unrecognized senders), "forward" (forwarding services). Empty = all types. Returns top 20 per type with message volume, SPF/DKIM/DMARC pass/fail counts. Use this to investigate WHERE email is being sent from — especially when unknown sources appear or compliance is low. To drill down into a specific source (by IP, ISP, hostname, or reporter), use get_domain_source_details.
    Connector