207,056 tools. Last updated 2026-06-17 18:36

"How to Convert a PowerPoint Presentation to a Video" matching MCP tools:

create_flow
Avocado AI
Create a new Avocado AI Flow pre-built with a node-graph pipeline, and return its id and direct URL so the user can open it on the canvas. You design the whole pipeline: pass the nodes and edges and the server validates socket compatibility, aligns video models to the input shape, lays the graph out left-to-right, and adds a caption per step. Edges reference nodes by 0-based index in the `nodes` array. This creates (does not run) the flow — the user runs it from the editor. Use the capability map below to choose node types, models, and handles: You are Avo, a senior creative-workflow designer inside Avocado AI's Flow editor. The user describes a creative goal; you respond with a node-graph proposal that the editor previews on the canvas. Think like a production director: design the FULL pipeline needed to get a polished result, not the minimum number of nodes. DESIGN PRINCIPLES — build capable, complete pipelines: - Match the pipeline's ambition to the request. A throwaway test is 2-3 nodes; a real deliverable (an ad, a UGC video, a product shot, a music video) is usually 5-12 nodes. Use up to 24 when it genuinely helps. - Prefer multi-stage quality: generate → refine (imageEditor) → upscale → animate, rather than a single generate node. Add an upscale step before any final image/video deliverable. - Use BRANCHING and FAN-OUT. One output can feed many nodes: e.g. one hero image → three different video models for variations the user can pick from; one script → both a voiceover and the video prompt. - Use PARALLEL TRACKS that converge: e.g. a voice track and an image track both feeding a lip-sync video; or a music track plus a visuals track. - Use the `llm` node to do creative thinking inside the graph — write or expand a script, brainstorm a prompt, turn a rough idea into a detailed image/video prompt — then wire its text output into the next node. - Pick the BEST model for each step (see the menus below). Don't leave everything on defaults — choosing models is a big part of the value. - Set per-node settings (aspect ratio, resolution, duration, voice, variations) when the request implies them (e.g. 'vertical' → 9:16, 'short' → duration 5, '3 options' → variations 3 or three branches). HARD RULES: - Use only the node types listed below. Never invent new ones. - Every edge must connect compatible socket types (text→text, image→image, audio→audio, video→video). - Give every runnable node a short `stepLabel` ('Step N — …') — it renders as a caption beneath that node. - `stickyNote` is only for standalone notes; never use it to caption a node (use `stepLabel`). Optionally add ONE stickyNote describing the workflow. - Any schema field you don't need must be `null` (numbers like `variations` too). MODEL MENUS (set the node's `model` to one of these ids): image (text-to-image) — `model` ids: • fal-ai/nano-banana-2 — fast, strong all-rounder (default) • fal-ai/gpt-image-2 — best instruction-following & legible text • fal-ai/bytedance/seedream/v5/lite/text-to-image — photoreal • fal-ai/flux-pro/v1.1-ultra — high detail / fidelity • fal-ai/nano-banana-pro — premium quality • fal-ai/recraft/v4/text-to-image — design, brand, vector-style • fal-ai/ideogram/v3 — posters & typography imageEditor (image + prompt → edited image) — `model` ids: • fal-ai/nano-banana-2/edit — default, multi-image (up to 14 inputs) • openai/gpt-image-2/edit — precise instruction edits • fal-ai/bytedance/seedream/v5/lite/edit — photoreal edits • fal-ai/flux-pro/kontext/max/text-to-image — style / context transfer • fal-ai/gemini-25-flash-image/edit — fast edits (the `image` input accepts MULTIPLE connections for compositing/restyle) imageUpscale (image → larger image) — `model` ids: • fal-ai/topaz/upscale/image — best quality (default) • fal-ai/recraft-crisp-upscale, fal-ai/clarity-upscaler, fal-ai/crystal-upscaler llm (text → text) — `model` ids: claude-haiku (default), gpt-4o-mini, kimi-k2, seed-1.8. Put the instruction in `prompt`. voice (text → speech) — pick a `voice` by name: Sarah (cheerful), Roger (deep), Laura (soft), Charlie (warm), George (bold), Callum (energetic), River (calm), Liam (reliable). The script comes from an upstream text/llm node wired into `in` — do NOT put the script in the voice node's prompt. music (text → music) — set `duration` to one of 30,60,90,120,180,240,300 (seconds). Put the music description in `prompt`. videoUpscale (video → sharper video) — add after a video node for final deliverables. No model field. VIDEO node — choose `model` to match the input shape (it drives which input handles the node renders): • Text → video: `kling3-pro`, `sora-2`, `veo3-1-fast`, `seedance-2.0-t2v`. Wire text to `prompt`. • Image → video (I2V): `veo3-1-fast`, `kling3-pro`, `seedance-2.0-i2v`, `hailuo-pro`. Wire the image to `image`. For keyframe models (`kling-o1`, `veo3-1`) wire `start-frame` + `end-frame`. • Lip-sync / talking-head: `fabric` (image + audio, NO prompt — never wire text into Fabric) or `infinitalk` (prompt + image + audio). Wire audio to `audio`. Audio-over-stills narration: `ltx2-audio`. • Multi-image reference / character consistency: `vidu` (≤7), `veo3-1-ref` (≤10), `kling-elements` (2-4 ordered frames), `happy-horse-ref` (≤9). Wire EACH image to the SAME `ref-images` handle (it accepts multiple connections). Never use the plain `image` handle. • Seedance reference (image + video + audio refs): `seedance-2.0-ref` / `seedance-2.0-ref-fast`. Wire to `ref-images` / `ref-videos` / `ref-audio`. • Motion control (drive a character with a motion video): `kling3-motion-control`. Wire character to `image`, motion clip (videoUpload) to `motion-video`. Edge handle hints: - When the target has multiple typed inputs (Video, Image Editor), set `toHandle` explicitly (`prompt`, `image`, `audio`, `ref-images`, `start-frame`, `end-frame`, `motion-video`). The editor otherwise picks the first type-compatible handle, which may be the wrong slot. - Never wire text into Fabric. Never wire a single image into a multi-ref model's `image` slot — use `ref-images`. Available node types (id — purpose — inputs / outputs): - text — Prompt — in: in<text> | out: out<text> - llm — LLM — in: in<text> | out: out<text> - upload — Upload — in: — | out: out<image> - videoUpload — Video Upload — in: — | out: out<video> - image — Image — in: in<text> | out: out<image> - imageEditor — Image Editor — in: prompt<text>, image<image> | out: out<image> - imageUpscale — Image Upscale — in: image<image> | out: out<image> - video — Video — in: prompt<text>, image<image>, start-frame<image>, end-frame<image>, ref-images<image>, ref-videos<video>, ref-audio<audio>, audio<audio>, motion-video<video> | out: out<video> - videoUpscale — Video Upscale — in: video<video> | out: out<video> - voice — Voice — in: in<text> | out: out<audio> - music — Music — in: in<text> | out: out<audio> - stickyNote — Sticky Note — in: in<annotation> | out: out<annotation> Edges reference nodes by index in the `nodes` array (0-based). In the examples below, any field not shown is `null`. EXAMPLES — study the PATTERNS (multi-stage, fan-out, parallel tracks), copy the handle names exactly: Example 1 — UGC talking-head with scripted voice + final upscale: nodes=[ {type:"llm",stepLabel:"Step 1 — Write a punchy 15s script",prompt:"Write a 15-second energetic UGC script for the product.",model:"claude-haiku"}, {type:"voice",stepLabel:"Step 2 — Voiceover",voice:"George"}, {type:"upload",stepLabel:"Step 3 — Upload character photo"}, {type:"video",stepLabel:"Step 4 — Lip-sync video",model:"fabric"}, {type:"videoUpscale",stepLabel:"Step 5 — Upscale to deliver"} ] edges=[ {fromIndex:0,toIndex:1,fromHandle:"out",toHandle:"in"}, {fromIndex:1,toIndex:3,fromHandle:"out",toHandle:"audio"}, {fromIndex:2,toIndex:3,fromHandle:"out",toHandle:"image"}, {fromIndex:3,toIndex:4,fromHandle:"out",toHandle:"video"} ] Example 2 — Text → image → refine → upscale (quality chain): nodes=[ {type:"text",stepLabel:"Step 1 — Prompt",prompt:"A cinematic product shot of a matte-black bottle on wet stone, golden hour"}, {type:"image",stepLabel:"Step 2 — Generate hero",model:"fal-ai/flux-pro/v1.1-ultra",aspectRatio:"4:3"}, {type:"imageEditor",stepLabel:"Step 3 — Add brand label",prompt:"Add a minimal embossed logo on the bottle",model:"fal-ai/nano-banana-2/edit"}, {type:"imageUpscale",stepLabel:"Step 4 — Upscale",model:"fal-ai/topaz/upscale/image"} ] edges=[ {fromIndex:0,toIndex:1,fromHandle:"out",toHandle:"in"}, {fromIndex:1,toIndex:2,fromHandle:"out",toHandle:"image"}, {fromIndex:2,toIndex:3,fromHandle:"out",toHandle:"image"} ] Example 3 — Fan-out: one image → three video variations (different models): nodes=[ {type:"upload",stepLabel:"Step 1 — Source image"}, {type:"text",stepLabel:"Step 2 — Motion brief",prompt:"Slow cinematic push-in, gentle parallax"}, {type:"video",stepLabel:"Variation A — Veo",model:"veo3-1-fast",aspectRatio:"9:16",duration:"5"}, {type:"video",stepLabel:"Variation B — Kling",model:"kling3-pro",aspectRatio:"9:16",duration:"5"}, {type:"video",stepLabel:"Variation C — Seedance",model:"seedance-2.0-i2v",aspectRatio:"9:16",duration:"5"} ] edges=[ {fromIndex:0,toIndex:2,fromHandle:"out",toHandle:"image"}, {fromIndex:0,toIndex:3,fromHandle:"out",toHandle:"image"}, {fromIndex:0,toIndex:4,fromHandle:"out",toHandle:"image"}, {fromIndex:1,toIndex:2,fromHandle:"out",toHandle:"prompt"}, {fromIndex:1,toIndex:3,fromHandle:"out",toHandle:"prompt"}, {fromIndex:1,toIndex:4,fromHandle:"out",toHandle:"prompt"} ] Example 4 — Multi-image reference video (character consistency): nodes=[ {type:"upload",stepLabel:"Ref 1 — Character front"}, {type:"upload",stepLabel:"Ref 2 — Character side"}, {type:"upload",stepLabel:"Ref 3 — Outfit detail"}, {type:"text",stepLabel:"Scene prompt",prompt:"The character walks through a neon market at night"}, {type:"video",stepLabel:"Generate with refs",model:"veo3-1-ref",aspectRatio:"16:9"} ] edges=[ {fromIndex:0,toIndex:4,fromHandle:"out",toHandle:"ref-images"}, {fromIndex:1,toIndex:4,fromHandle:"out",toHandle:"ref-images"}, {fromIndex:2,toIndex:4,fromHandle:"out",toHandle:"ref-images"}, {fromIndex:3,toIndex:4,fromHandle:"out",toHandle:"prompt"} ] Example 5 — Music video: parallel music + visuals tracks converging: nodes=[ {type:"music",stepLabel:"Track 1 — Score",prompt:"Dreamy lo-fi beat, 90 BPM",duration:"60"}, {type:"text",stepLabel:"Track 2 — Scene",prompt:"A lone astronaut drifting past a glowing planet"}, {type:"image",stepLabel:"Keyframe",model:"fal-ai/nano-banana-pro",aspectRatio:"16:9"}, {type:"video",stepLabel:"Animate",model:"ltx2-audio",aspectRatio:"16:9"} ] edges=[ {fromIndex:1,toIndex:2,fromHandle:"out",toHandle:"in"}, {fromIndex:2,toIndex:3,fromHandle:"out",toHandle:"image"}, {fromIndex:0,toIndex:3,fromHandle:"out",toHandle:"audio"} ] Return only the structured object — no prose, no markdown.
Connector
generate_video_to_storyboard
Avocado AI
Generate an AI video and place it directly on a user's Avocado AI storyboard. Drops a 'Generating...' placeholder on the board immediately, then the storyboard's recovery hook swaps it for the final video when generation completes (2-10 minutes). Use list_storyboards or create_storyboard first to obtain the storyboard_id. If the user has the storyboard tab open, they may need to refresh once for the video to appear (the canvas does not yet support live realtime swap from MCP). Eight models supported: seedance-2.0-t2v / -t2v-fast (text only), seedance-2.0-i2v / -i2v-fast (REQUIRE an image), kling3-standard (720p, 5-10s), kling3-pro (1080p, 5-10s), kling3-4k & kling-o3-4k (4K, 3-15s; all four Kling 3.x variants support BOTH text-to-video and image-to-video). For image-to-video: call prepare_image_upload first, then pass the returned file_id here. Pricing is per-second, varies by model and resolution.
Connector
taste
Pane
Read / write / clear the agent's freeform UI taste notes (a small markdown document of presentation preferences learned from human feedback — 'denser layout', 'no rounded corners'). ONE tool with an `action` enum: get | set | clear. Call `get` BEFORE generating a pane so prior feedback shapes the output; `set` does a whole-document replace (not append). Keep entries about UI/presentation only.
Connector
generate_video
Swarm Tips — Aggregated AI Agent Activities
[SPEND: 5 USDC] Generate a short-form video from a prompt or URL. Costs 5 USDC (Base/Ethereum/Polygon/Solana via x402). First call without tx_signature returns `{status: "payment_required", instructions, payment_details: {chain, address, amount, memo}}` from the x402 v2 protocol — pay the indicated amount to that address on that chain, then call again with tx_signature set to the broadcast tx hash to trigger generation. Returns a session_id to poll with check_video_status. Tip: the generated video can be submitted to a Shillbot task via shillbot_submit_work to earn back more than the spend.
Connector
json_to_csv
IA-QA — 130+ QA & Dev Tools for AI Agents
Convert a JSON array of objects to CSV format. Automatically detects columns from all object keys. Handles quoting and escaping per RFC 4180.
Connector
ask_video
Reka
Ask a question about one or more videos with visual analysis. Most effective on focused time ranges — use start/end to specify the segment to analyze. BEFORE calling this tool, read the reka://docs/guide resource for recommended workflows. In most cases, you should first: - search_videos to find WHEN something happens, then pass those timestamps here as start/end - segment_video to detect and locate specific objects - get_transcript to read what was said For single-video questions, pass video_id with start/end. For cross-video questions, pass videos — a list of video references with start/end each. For follow-up questions, pass conversation_id from the previous response. You can add start/end to drill into a specific moment while keeping the conversation context. Requires qa_only or full pipeline.
Connector

Matching MCP Servers

Video to Text MCP Server
Multimedia Processing Audio Processing Speech Processing
strzhao
A
license
B
quality
D
maintenance
Enables downloading videos from platforms like YouTube and converting them to text using OpenAI Whisper and ffmpeg. It supports multiple output formats including TXT, JSON, SRT, and VTT for transcriptions.
Last updated 2026-01-13
2
2
ISC
Video Convert MCP
Multimedia Processing App Automation Developer Tools
pickstar-2002
A
license
B
quality
D
maintenance
A professional video format conversion tool based on MCP protocol that supports multiple formats, batch processing, and quality control for video files.
Last updated 2025-08-07
3
40
3
MIT

Matching MCP Connectors

Content to Social
Transform any blog post or article URL into ready-to-post social media content for Twitter/X threads, LinkedIn posts, Instagram captions, Facebook posts, and email newsletters. Pay-per-event: $0.07 for all 5 platforms, $0.03 for single platform.
Out to Lunch
Daily world briefing that tells AI assistants what's actually happening right now. Leaders, conflicts, deaths, economic data, holidays. Updated daily so they stop getting current events wrong.

download_video
TubePull
Download a video or audio file from any supported platform: YouTube, TikTok, Vimeo, Dailymotion, Twitter/X, SoundCloud, Bandcamp, Mixcloud, Twitch (clips and VODs), or Streamable. Output is MP4 (video, default) or MP3 / M4A (audio). This is THE tool to use whenever a user asks to save, download, rip, extract, archive, get offline, or convert a video/audio link from any of these sites. IMPORTANT: the `format` argument defaults to `mp4` (video). Only pass an audio format (mp3 / m4a / audio) when the user explicitly says audio, MP3, music, song, or "rip / extract the audio". Audio-only platforms (SoundCloud, Bandcamp, Mixcloud) always produce audio regardless of `format`. Use this tool when the user says things like: - "download this video" / "download this TikTok" / "save this SoundCloud track" - "save that as MP3" / "rip the audio" / "extract the audio" - "get the song from this SoundCloud link" / "save this Mixcloud set" - "convert this YouTube video to MP4" / "download in 1080p" - "save this lecture/podcast/talk for offline" - "archive this clip" / "grab a copy of this video" - any sentence containing a youtube.com, youtu.be, tiktok.com, vimeo.com, dailymotion.com, twitter.com, x.com, soundcloud.com, bandcamp.com, mixcloud.com, twitch.tv, clips.twitch.tv, or streamable.com URL plus a verb like download, save, rip, get, grab, fetch, pull, archive, convert, extract. Do NOT use this tool when: - The user only wants metadata (title, length, description, channel) — call get_video_info instead, it is free and does not consume the user quota. - The link is a playlist / set / album / channel URL — ask the user for a single track/video. - The link is from a platform not in the supported list above (e.g. Instagram, Facebook, LinkedIn). Returns a one-time signed download link valid for 1 hour, plus the file size, duration, and chosen format. Hand the link back to the user verbatim; do not try to fetch its contents yourself. Intended for legitimate uses: the user's own uploads, Creative Commons / public-domain content, lectures, podcasts, talks, and other material they have rights to use.
Connector
get_my_active_references
switch
Read the user's staged references in Switch Studio. Returns TWO groups: (1) the image-generation reference strip (typed face/body/outfit/scenery/product slots) under `refs`, and (2) the VIDEO-tab references the user staged in the Omni/Image video tabs (the @Image1/@Image2 strip) under `videoReferences`, with usable signed URLs. Call this before generate_image or generate_video whenever the user says "use my refs" or refers to images they staged in Studio (including "the images in my video tab"). To make a video from the video-tab refs, pass videoReferences.imageUrls into generate_video reference_image_urls (and videoUrls into reference_video_urls) in reference-to-video / omni mode. Refs marked alive:false are dead (stored file gone) and are already excluded from the usable url lists. NOTE: a photo the user just attached in THIS chat is in neither group — for that, call upload_media and use its returned url/asset id directly.
Connector
create_slide
Alai
Add a new slide to an existing presentation. Args: presentation_id: ID of the presentation to add the slide to slide_context: Content for this slide slide_type: Slide type, "classic" or "creative". Defaults to "classic". additional_instructions: Extra guidance for the AI slide_order: Position in presentation (0-indexed). Omit to append at end. Returns a generation_id to poll for completion.
Connector
transcode_from_url
Botverse
Offload a video or audio transcode to Botverse using a public URL — no upload step needed. Accepts Dropbox, Google Drive, OneDrive, SharePoint, and Box share links directly — pass the share URL as-is, no manual conversion needed. Also works with any direct HTTPS download URL (CDN, S3, etc.). Limited to 2 GB. Returns a job_id immediately. IMPORTANT: tell the user the job_id right away so they can track it. Then poll get_job_status every 5 seconds. Large video files (>100 MB) can take 5–15 minutes — keep polling until status is 'complete' or 'failed', no matter how many polls it takes. Never give up early. Wallet debited on completion. Use options.start_time and options.duration to trim — e.g. start_time='00:01:00', duration=120 for a 2-minute clip.
Connector
lip_sync_video
switch
Lip-sync audio onto a face in a video (Kling). Three steps you orchestrate: (1) action="identify-face" with video_url to detect faces (video must be MP4/MOV, 2-60 seconds, <=100MB, 720p or 1080p); (2) action="create" with session_id + a face_id + audio (sound_file as a base64 data URI, or an audio_id) + timing IN MILLISECONDS (sound_start_time, sound_end_time, sound_insert_time) + optional speech_volume/original_audio_volume (0-100); (3) action="status" with the task_id to poll — returns a branded SwitchApp view_url when done. Charges credits on create; failed jobs are refunded.
Connector
glim_youtube_get
glim.sh
Fetch a YouTube video transcript/subtitles from a video URL or 11-char id. Default format='text' returns the transcript inline (when it fits ~80K chars / ~20K tokens) so a single call gives you the text directly; long-form videos fall back to a download_url note. Pass format='json' for structured metadata + a presigned download_url (no inline transcript) - for batch/programmatic use. Default origin='uploader_provided' (human captions); falls back to 'auto_generated' automatically if missing (counts as 2 upstream calls). Cached 7 days server-side.
Connector
get_video_info
TubePull
Fetch metadata about a video or audio track WITHOUT downloading it. Works on every platform download_video supports: YouTube, TikTok, Vimeo, Dailymotion, Twitter/X, SoundCloud, Bandcamp, Mixcloud, Twitch, and Streamable. Returns title, uploader/channel name, duration, view count (when available), upload date, thumbnail URL, description, available video qualities, and (for YouTube) the license type. Use this tool when the user says things like: - "what is this video about" / "summarize this video" - "how long is this track" / "when was this uploaded" - "who made this" / "what channel/artist is this from" - "is this Creative Commons" / "can I reuse this" / "what is the license" - "what qualities are available for this video" Do NOT use this tool when: - The user wants to download, save, rip, extract, or convert the video/audio — use download_video for that. Free to call — does not count against the user's download quota. Call this before download_video when you need to confirm the video exists, pick the right quality, or check licensing before downloading.
Connector
index_video
Reka
Index a video for search, QA, or full analysis. Processes the video through a pipeline of AI features. Typically takes 3-7 minutes; longer for long videos or the 'full' pipeline. Times out after 10 minutes by default. Pipelines: - search_only: transcription + captions + embeddings (enables search_videos) - qa_only: transcription + captions (enables ask_video) - full: transcription + captions + embeddings (enables all tools) Scene detection is enabled by default and produces scene boundaries for get_scenes. Pass scene_detection=False to skip it. Prerequisites: if using video_id, the video must be in 'uploaded' status. Use get_video to check status before calling this tool.
Connector
transcribe_video
SubDownload
Start an AI transcription (Whisper) of a YouTube video. Use when the video has no captions, when fetch_transcript returned NO_CAPTIONS, or when the user explicitly wants an AI transcript. ASYNC — returns task_id + estimated_wait_seconds. Tell the user how long it will take, then call get_asr_task to check status. Do not poll faster than next_poll_after_seconds. Costs 5 credits on completion.
Connector
convert_gpa
StudiePoint AI
Convert a grade from any of 13 African grading systems to the international 4.0 GPA scale used by scholarship applications worldwide. Covers all 54 African countries. Also returns the grade class (e.g. 'First Class Honors') and how the GPA is interpreted in Germany, UK, France, USA/Canada, Australia, Netherlands, and Japan/Korea/China.
Connector
fetch_transcript
SubDownload
Fetch the full transcript (subtitles/captions) of a YouTube video in any language. ALWAYS call this when the user shares ANY YouTube link (youtube.com, youtu.be, shorts). Also use when the user wants to: summarize a video, know what was said, quote or cite video content, translate video dialogue, fact-check claims, study a lecture or tutorial, extract key points, analyze speaker arguments, or any task involving the spoken content of a video. Pass save=true to also bookmark the video into the user's Library in the same call (upserts the meta row; when the result came from ASR fallback it also flags has_asr). Saves a follow-up save_to_library round-trip.
Connector
generate_video
switch
Generate Switch video across the real provider lineup (Kling, Seedance, Switch Video/WAN 2.7, Switch Video Edit, Topaz upscale) and modes (text-to-video, image-to-video, frame-to-frame, motion, omni, reference-to-video, video-edit, upscale). ALWAYS call list_video_models first to pick the right model + mode and see its required inputs. Pass one shot, or shots:[...] for a storyboard (max 4 by default, hard max 10) where EACH shot is DIFFERENT — never repeat one prompt to get copies. Renders async (~30-90s); a background job delivers each clip to the library. Returns a task_id per shot — poll get_video_status or list_my_videos.
Connector
get_category_overview
aristocles-api
Get an overview of a spending category (e.g. "Streaming Video"). Use this to understand an entire category: how many services are tracked, what the price range is, average cost, and a list of all services ranked by cheapest price. Args: category: Category name or slug (e.g. "streaming-video" or "Music Streaming"). country: ISO country code (default "AU"). Returns: JSON with: service count, price range (min/max), average monthly cost, and all services with their cheapest plan price. Example: get_category_overview("streaming-video", "AU")
Connector
reverse_geocode
geo
Convert coordinates to a physical address. Returns street address, city, country, and postal code. Use to identify locations from lat/lng pairs.
Connector