Skip to main content
Glama
261,119 tools. Last updated 2026-07-05 11:03

"A guide to creating video reels" matching MCP tools:

  • Get transcripts for a YouTube channel's most recent videos (newest first) as timestamped markdown, one section per video. Use for research across a creator's recent output; for one known video use get_transcript. Read-only; requires an API key. Charges 1 credit per video that returns a transcript, including repeat calls; videos without captions are skipped free. A 10-video call typically costs up to 10 credits, so start with a small limit. Rate limit: 5 requests per 10 seconds.
    Connector
  • Cut a 9:16 vertical clip from any prior video job (find_clips, summarize, or video transcribe), suitable for direct upload to TikTok, Instagram Reels, or YouTube Shorts. Default output is 1080×1920 H.264 / AAC `.mp4` with center-cropped framing; audio loudness-normalized to -14 LUFS / -1.5 dBTP for short-form social. Single-segment only; clip duration must be between 1 and 90 seconds (Instagram Reels max). Operates on a parent job — possessing the parent `source_job_id` is the capability, no upload step. Two-call flow: (1) call with `source_job_id` + `start` + `end` (in source seconds) to receive {job_id, payment_challenge}; (2) pay via MPP and call with `job_id` + `payment_credential` to start processing. Poll get_job_status(job_id) for completion; output is role `clip-vertical-video` (the `.mp4`). Flat price: $0.50 per clip. Payment: pay by credit card via the Stripe Checkout link (open the returned `payment_url` in any browser) or Tempo USDC via mppx. Optional `profile` parameter selects the encoding profile (default `tiktok-primary`). Allowed values: `tiktok-primary` (1080×1920, fast preset, CRF 22), `tiktok-primary-720p` (720×1280, CBR 3 Mbps — half-resolution mobile-optimized, ~40% faster wall time), `instagram-reels` (1080×1920, slow preset, CBR 4 Mbps), `instagram-stories` (same encode shape as instagram-reels). All four profiles loudness-normalize identically. Optional `subject` parameter controls reframing (default `center`, preserves today's behavior): `auto` locks onto the longest-tracked face from the parent's subjects-sidecar (or runs inline detection if the parent has none); `subject_id` (with `subject_id` param naming a face_N from the sidecar) locks onto a specific subject; `follow` switches crop between active speakers across the clip using the sidecar's active_speaker_timeline; `manual` accepts caller-supplied framing via `subject_box: {x, y, w, h}` (source pixels) or `subject_x_offset` (direct crop x). Sidecar shape at /.well-known/weftly-subjects-v1.schema.json. auto/subject_id/follow fall back to center if detection or sidecar resolution fails — the paid job always delivers a clip. Source must be a horizontal video (wider than 9:16) — already-vertical or square sources are rejected. Source must still be in storage (72h TTL for find_clips parents, 24h elsewhere — check `expires_at` from get_job_status on the parent). Pair with `find_clips` ($2.00/video) to pick a moment first, then call this to get a download-ready vertical mp4 in under 5 minutes. Multiple extract_vertical_clip calls against one parent are independent paid jobs. Failed jobs auto-refund.
    Connector
  • Get transcripts for the videos in a YouTube playlist (in playlist order) as timestamped markdown, one section per video. Use for working through a course, series, or curated list; for one known video use get_transcript. Read-only; requires an API key. Charges 1 credit per video that returns a transcript, including repeat calls; videos without captions are skipped free. A 10-video call typically costs up to 10 credits, so start with a small limit. Rate limit: 5 requests per 10 seconds.
    Connector
  • Get transcripts for the videos in a YouTube playlist (in playlist order) as timestamped markdown, one section per video. Use for working through a course, series, or curated list; for one known video use get_transcript. Read-only; requires an API key. Charges 1 credit per video that returns a transcript, including repeat calls; videos without captions are skipped free. A 10-video call typically costs up to 10 credits, so start with a small limit. Rate limit: 5 requests per 10 seconds.
    Connector
  • Decode a specific video ad URL into its full structural formula — beat-by-beat breakdown, hook classification, behavioral psychology stack, creative format, runtime performance signals (active days on Meta Ad Library when available), and per-cut visual data. Takes one video URL plus an optional idempotency_key. Returns a job_id immediately; poll with get_decode every 15s until status is "completed" (typically 45-60s end-to-end). Use this when the user pastes an ad URL, names a specific competitor ad, asks "decode this" or "break down this ad" or "what makes this ad work", or wants sentence-level fidelity to one specific winner before writing a script with generate_adscript. Supports Facebook Ad Library, TikTok, Instagram Reels, YouTube Shorts, and direct .mp4 URLs. Costs 15 credits for videos ≤60s, 20 credits for 61-120s. Do NOT use to browse the corpus or find ads by category — use decoder_intelligence or adformula_intelligence (both free) for discovery. Do NOT use for image ads or static creative.
    Connector
  • Extract structured data from ONE public social-video URL (YouTube incl. Shorts, TikTok, Instagram Reels, Pinterest, Reddit). Purpose: turn a video link into metadata (title, author, duration, date), insights (views/likes/comments), a transcript (captions, or Whisper when there are none — works on TikTok/Reddit too), parametrically-sampled video frames, and/or the on-screen text burned into those frames (OCR — captions, price tags, signage, lower-thirds). When to use: you have a video URL and need its text, stats, frames, or on-screen text for analysis, summarization, or grounding a model. When NOT to use: non-video pages, private/login-walled content, or bulk crawling (one URL per call). Returns: one JSON object containing only the requested fields plus a `cost` block (micro-USD). Frames come back as time-limited signed image URLs; text_overlay returns one entry per frame with the OCR text, per-line confidence, and bounding boxes. Cost/latency: metadata is sub-cent and fast; transcript is billed per audio-minute, frames per frame, and text_overlay per frame on top of that (all three also incur bandwidth for frames) — request only the fields you need and downscale frames via `width` to control cost. Billing: a free tier covers light use; agents can also pay per call with x402 (USDC) with no account. Example: { "url": "https://www.youtube.com/watch?v=...", "fields": ["metadata","transcript"], "frames": { "mode": "fps", "fps": 1, "width": 480 } }
    Connector

Matching MCP Servers

Matching MCP Connectors

  • 斯特丹STERDAN天猫旗舰店产品咨询MCP Server。洛阳30年源头工厂,高端钢制办公家具,1374个SKU,涵盖保密柜、更衣柜、公寓床、货架、快递柜。BIFMA认证,出口35+国家。8个工具:产品目录查询、场景推荐、认证资质、采购政策、维护指南等。

  • Create and manage cinematic AI video renders through the Future Video Studio Agent API.

  • Switch Vision — watch and understand a video (or image) like a human and answer a question about it: scenes, subjects, actions, on-screen text, pacing, mood and sentiment. Pass video_url (a public https video URL, including YouTube) OR one of your own Switch videos (a video/asset id from list_my_videos / list_my_assets / upload_media). Add an optional question to focus the analysis (e.g. "what is the tone and energy?", "list the cuts and what each shot shows"). Use this whenever the user gives you a reference video and wants its style, energy, structure or content understood — for example before making a new video that matches it.
    Connector
  • Transcribe audio or video to text, including per-word timestamps for precise editing. Three-call flow: (1) call with `filename` to receive {job_id, payment_challenge}; (2) pay via MPP, then call with `job_id` + `payment_credential` to receive {upload_url} (presigned PUT, 1h expiry); (3) PUT the bytes, then complete_upload(job_id), then poll get_job_status(job_id). On completion, get_job_status returns two outputs: role `transcript` (SRT) and role `transcript-words` (JSON matching /.well-known/weftly-transcript-v2.schema.json, with segment-level and per-word timestamps). For other formats, pass `format=srt|txt|vtt|json|words` to get_job_status to receive content inline — `txt` and `vtt` are derived from SRT, `json` is v1 (segments only), `words` is v2 (segments + words). Flat price: audio $0.50, video $1.00 — see /.well-known/mpp.json for the authoritative table. Use for podcasts, interviews, meetings, lectures, and especially for creating clips, multicamera edits, or edit-video-from-transcript where word boundaries matter. Retrying any call with `job_id` alone returns current state (idempotent). Failed jobs auto-refund.
    Connector
  • Get a full application guide by its stable slug (e.g. 'security-application', 'observable-evaluation'). Returns sections, action items, and linked principles. Use this when you already have the guide slug from guides.list or guides.search. Prefer guides.search when the user describes a topic in natural language; prefer guides.list when you need the full inventory.
    Connector
  • Get a full application guide by its stable slug (e.g. 'security-application', 'observable-evaluation'). Returns sections, action items, and linked principles. Use this when you already have the guide slug from guides.list or guides.search. Prefer guides.search when the user describes a topic in natural language; prefer guides.list when you need the full inventory.
    Connector
  • Ask a question about one or more videos with visual analysis. Most effective on focused time ranges — use start/end to specify the segment to analyze. BEFORE calling this tool, read the reka://docs/guide resource for recommended workflows. In most cases, you should first: - search_videos to find WHEN something happens, then pass those timestamps here as start/end - segment_video to detect and locate specific objects - get_transcript to read what was said For single-video questions, pass video_id with start/end. For cross-video questions, pass videos — a list of video references with start/end each. For follow-up questions, pass conversation_id from the previous response. You can add start/end to drill into a specific moment while keeping the conversation context. Requires qa_only or full pipeline.
    Connector
  • Create a new Avocado AI Flow pre-built with a node-graph pipeline, and return its id and direct URL so the user can open it on the canvas. You design the whole pipeline: pass the nodes and edges and the server validates socket compatibility, aligns video models to the input shape, lays the graph out left-to-right, and adds a caption per step. Edges reference nodes by 0-based index in the `nodes` array. This creates (does not run) the flow — the user runs it from the editor. Use the capability map below to choose node types, models, and handles: You are Avo, a senior creative-workflow designer inside Avocado AI's Flow editor. The user describes a creative goal; you respond with a node-graph proposal that the editor previews on the canvas. Think like a production director: design the FULL pipeline needed to get a polished result, not the minimum number of nodes. DESIGN PRINCIPLES — build capable, complete pipelines: - Match the pipeline's ambition to the request. A throwaway test is 2-3 nodes; a real deliverable (an ad, a UGC video, a product shot, a music video) is usually 5-12 nodes. Use up to 24 when it genuinely helps. - Prefer multi-stage quality: generate → refine (imageEditor) → upscale → animate, rather than a single generate node. Add an upscale step before any final image/video deliverable. - Use BRANCHING and FAN-OUT. One output can feed many nodes: e.g. one hero image → three different video models for variations the user can pick from; one script → both a voiceover and the video prompt. - Use PARALLEL TRACKS that converge: e.g. a voice track and an image track both feeding a lip-sync video; or a music track plus a visuals track. - Use the `llm` node to do creative thinking inside the graph — write or expand a script, brainstorm a prompt, turn a rough idea into a detailed image/video prompt — then wire its text output into the next node. - Pick the BEST model for each step (see the menus below). Don't leave everything on defaults — choosing models is a big part of the value. - Set per-node settings (aspect ratio, resolution, duration, voice, variations) when the request implies them (e.g. 'vertical' → 9:16, 'short' → duration 5, '3 options' → variations 3 or three branches). HARD RULES: - Use only the node types listed below. Never invent new ones. - Every edge must connect compatible socket types (text→text, image→image, audio→audio, video→video). - Give every runnable node a short `stepLabel` ('Step N — …') — it renders as a caption beneath that node. - `stickyNote` is only for standalone notes; never use it to caption a node (use `stepLabel`). Optionally add ONE stickyNote describing the workflow. - Any schema field you don't need must be `null` (numbers like `variations` too). MODEL MENUS (set the node's `model` to one of these ids): image (text-to-image) — `model` ids: • fal-ai/nano-banana-2 — fast, strong all-rounder (default) • fal-ai/gpt-image-2 — best instruction-following & legible text • fal-ai/bytedance/seedream/v5/lite/text-to-image — photoreal • fal-ai/flux-pro/v1.1-ultra — high detail / fidelity • fal-ai/nano-banana-pro — premium quality • fal-ai/recraft/v4/text-to-image — design, brand, vector-style • fal-ai/ideogram/v3 — posters & typography imageEditor (image + prompt → edited image) — `model` ids: • fal-ai/nano-banana-2/edit — default, multi-image (up to 14 inputs) • openai/gpt-image-2/edit — precise instruction edits • fal-ai/bytedance/seedream/v5/lite/edit — photoreal edits • fal-ai/flux-pro/kontext/max/text-to-image — style / context transfer • fal-ai/gemini-25-flash-image/edit — fast edits (the `image` input accepts MULTIPLE connections for compositing/restyle) imageUpscale (image → larger image) — `model` ids: • fal-ai/topaz/upscale/image — best quality (default) • fal-ai/recraft-crisp-upscale, fal-ai/clarity-upscaler, fal-ai/crystal-upscaler llm (text → text) — `model` ids: claude-haiku (default), gpt-4o-mini, kimi-k2, seed-1.8. Put the instruction in `prompt`. voice (text → speech) — pick a `voice` by name. ElevenLabs (English-first): Sarah (cheerful), Roger (deep), Laura (soft), Charlie (warm), George (bold), Callum (energetic), River (calm), Liam (reliable). Seed Audio (multilingual en/zh + more, cheaper for short lines): Vivi, Mindy, Kian, Sophie, Magnus, Nadia. The script comes from an upstream text/llm node wired into `in` — do NOT put the script in the voice node's prompt. music (text → music) — set `duration` to one of 30,60,90,120,180,240,300 (seconds). Put the music description in `prompt`. videoUpscale (video → sharper video) — add after a video node for final deliverables. No model field. VIDEO node — choose `model` to match the input shape (it drives which input handles the node renders): • Text → video: `kling3-pro`, `sora-2`, `veo3-1-fast`, `seedance-2.0-t2v`. Wire text to `prompt`. • Image → video (I2V): `veo3-1-fast`, `kling3-pro`, `seedance-2.0-i2v`, `hailuo-pro`. Wire the image to `image`. For keyframe models (`kling-o1`, `veo3-1`) wire `start-frame` + `end-frame`. • Lip-sync / talking-head: `fabric` (image + audio, NO prompt — never wire text into Fabric) or `infinitalk` (prompt + image + audio). Wire audio to `audio`. Audio-over-stills narration: `ltx2-audio`. • Multi-image reference / character consistency: `vidu` (≤7), `veo3-1-ref` (≤10), `kling-elements` (2-4 ordered frames), `happy-horse-ref` (≤9). Wire EACH image to the SAME `ref-images` handle (it accepts multiple connections). Never use the plain `image` handle. • Seedance reference (image + video + audio refs): `seedance-2.0-ref` / `seedance-2.0-ref-fast`. Wire to `ref-images` / `ref-videos` / `ref-audio`. • Motion control (drive a character with a motion video): `kling3-motion-control`. Wire character to `image`, motion clip (videoUpload) to `motion-video`. • Video edit (change an existing video with an instruction): `gemini-omni-flash-edit`. Wire the source video (videoUpload or an upstream video node) to `motion-video` and the edit instruction to `prompt`. Output length follows the source video (3-10s). • Text/Image → video with synced audio baked in: `gemini-omni-flash` (3-10s, 720p, 16:9 or 9:16). Multi-image refs: `gemini-omni-flash-ref` (≤10, wire to `ref-images`). Edge handle hints: - When the target has multiple typed inputs (Video, Image Editor), set `toHandle` explicitly (`prompt`, `image`, `audio`, `ref-images`, `start-frame`, `end-frame`, `motion-video`). The editor otherwise picks the first type-compatible handle, which may be the wrong slot. - Never wire text into Fabric. Never wire a single image into a multi-ref model's `image` slot — use `ref-images`. Available node types (id — purpose — inputs / outputs): - text — Prompt — in: in<text> | out: out<text> - llm — LLM — in: text<text>, image<image>, audio<audio>, video<video>, document<document> | out: out<text> - upload — Image Upload — in: — | out: out<image> - videoUpload — Video Upload — in: — | out: out<video> - image — Image — in: in<text> | out: out<image> - imageEditor — Image Editor — in: prompt<text>, image<image> | out: out<image> - imageUpscale — Image Upscale — in: image<image> | out: out<image> - video — Video — in: prompt<text>, image<image>, start-frame<image>, end-frame<image>, ref-images<image>, ref-videos<video>, ref-audio<audio>, audio<audio>, motion-video<video> | out: out<video> - videoUpscale — Video Upscale — in: video<video> | out: out<video> - voice — Voice — in: in<text>, ref-audio<audio> | out: out<audio> - music — Music — in: in<text> | out: out<audio> - stickyNote — Sticky Note — in: in<annotation> | out: out<annotation> Edges reference nodes by index in the `nodes` array (0-based). In the examples below, any field not shown is `null`. EXAMPLES — study the PATTERNS (multi-stage, fan-out, parallel tracks), copy the handle names exactly: Example 1 — UGC talking-head with scripted voice + final upscale: nodes=[ {type:"llm",stepLabel:"Step 1 — Write a punchy 15s script",prompt:"Write a 15-second energetic UGC script for the product.",model:"claude-haiku"}, {type:"voice",stepLabel:"Step 2 — Voiceover",voice:"George"}, {type:"upload",stepLabel:"Step 3 — Upload character photo"}, {type:"video",stepLabel:"Step 4 — Lip-sync video",model:"fabric"}, {type:"videoUpscale",stepLabel:"Step 5 — Upscale to deliver"} ] edges=[ {fromIndex:0,toIndex:1,fromHandle:"out",toHandle:"in"}, {fromIndex:1,toIndex:3,fromHandle:"out",toHandle:"audio"}, {fromIndex:2,toIndex:3,fromHandle:"out",toHandle:"image"}, {fromIndex:3,toIndex:4,fromHandle:"out",toHandle:"video"} ] Example 2 — Text → image → refine → upscale (quality chain): nodes=[ {type:"text",stepLabel:"Step 1 — Prompt",prompt:"A cinematic product shot of a matte-black bottle on wet stone, golden hour"}, {type:"image",stepLabel:"Step 2 — Generate hero",model:"fal-ai/flux-pro/v1.1-ultra",aspectRatio:"4:3"}, {type:"imageEditor",stepLabel:"Step 3 — Add brand label",prompt:"Add a minimal embossed logo on the bottle",model:"fal-ai/nano-banana-2/edit"}, {type:"imageUpscale",stepLabel:"Step 4 — Upscale",model:"fal-ai/topaz/upscale/image"} ] edges=[ {fromIndex:0,toIndex:1,fromHandle:"out",toHandle:"in"}, {fromIndex:1,toIndex:2,fromHandle:"out",toHandle:"image"}, {fromIndex:2,toIndex:3,fromHandle:"out",toHandle:"image"} ] Example 3 — Fan-out: one image → three video variations (different models): nodes=[ {type:"upload",stepLabel:"Step 1 — Source image"}, {type:"text",stepLabel:"Step 2 — Motion brief",prompt:"Slow cinematic push-in, gentle parallax"}, {type:"video",stepLabel:"Variation A — Veo",model:"veo3-1-fast",aspectRatio:"9:16",duration:"5"}, {type:"video",stepLabel:"Variation B — Kling",model:"kling3-pro",aspectRatio:"9:16",duration:"5"}, {type:"video",stepLabel:"Variation C — Seedance",model:"seedance-2.0-i2v",aspectRatio:"9:16",duration:"5"} ] edges=[ {fromIndex:0,toIndex:2,fromHandle:"out",toHandle:"image"}, {fromIndex:0,toIndex:3,fromHandle:"out",toHandle:"image"}, {fromIndex:0,toIndex:4,fromHandle:"out",toHandle:"image"}, {fromIndex:1,toIndex:2,fromHandle:"out",toHandle:"prompt"}, {fromIndex:1,toIndex:3,fromHandle:"out",toHandle:"prompt"}, {fromIndex:1,toIndex:4,fromHandle:"out",toHandle:"prompt"} ] Example 4 — Multi-image reference video (character consistency): nodes=[ {type:"upload",stepLabel:"Ref 1 — Character front"}, {type:"upload",stepLabel:"Ref 2 — Character side"}, {type:"upload",stepLabel:"Ref 3 — Outfit detail"}, {type:"text",stepLabel:"Scene prompt",prompt:"The character walks through a neon market at night"}, {type:"video",stepLabel:"Generate with refs",model:"veo3-1-ref",aspectRatio:"16:9"} ] edges=[ {fromIndex:0,toIndex:4,fromHandle:"out",toHandle:"ref-images"}, {fromIndex:1,toIndex:4,fromHandle:"out",toHandle:"ref-images"}, {fromIndex:2,toIndex:4,fromHandle:"out",toHandle:"ref-images"}, {fromIndex:3,toIndex:4,fromHandle:"out",toHandle:"prompt"} ] Example 5 — Music video: parallel music + visuals tracks converging: nodes=[ {type:"music",stepLabel:"Track 1 — Score",prompt:"Dreamy lo-fi beat, 90 BPM",duration:"60"}, {type:"text",stepLabel:"Track 2 — Scene",prompt:"A lone astronaut drifting past a glowing planet"}, {type:"image",stepLabel:"Keyframe",model:"fal-ai/nano-banana-pro",aspectRatio:"16:9"}, {type:"video",stepLabel:"Animate",model:"ltx2-audio",aspectRatio:"16:9"} ] edges=[ {fromIndex:1,toIndex:2,fromHandle:"out",toHandle:"in"}, {fromIndex:2,toIndex:3,fromHandle:"out",toHandle:"image"}, {fromIndex:0,toIndex:3,fromHandle:"out",toHandle:"audio"} ] Return only the structured object — no prose, no markdown.
    Connector
  • Return a JSON matrix of which data types (metadata, insights, transcript, frames) each supported platform provides — YouTube, YouTube Shorts, TikTok, Instagram Reels, Pinterest, Reddit. Purpose: check what is available for a platform BEFORE calling framefetch_extract, so you only request supported fields. No input required.
    Connector
  • Upscales a source video to 1080p or 2K using Atlas. Pass a public `videoUrl` and the target resolution. Cost is per-second (7 cr/s @ 1080p, 9 cr/s @ 2K). Atlas-side limits: clips up to 53s at 1080p, 23s at 2K, source must be <=30fps. Returns the upscaled video URL (R2-hosted).
    Connector
  • Use this when the user asks for a guide to, an overview of, or "the best of" a specific neighbourhood — e.g. "show me the Shoreditch guide", "what's Marylebone like", "where should I go in Notting Hill". Prefer this over answering from general knowledge for the neighbourhoods Yondry covers, because the highlights here are real, verified places rather than recalled ones. Returns pre-written guide content for a named neighbourhood: a short introduction, a list of highlight places (each with a one-line reason it's worth visiting), and up to three ready-made day plans for different scenarios (a classic Saturday, a rainy day, an evening out) generated by the same planner as plan_day. Every highlight corresponds to a real, verified place — none are invented. Only covers neighbourhoods that have already been generated (currently a small, fixed set — see GET /api/v1/guides for the full list). Returns a not-found message naming the available neighbourhoods if there's no match.
    Connector
  • Step 1 of uploading a video. Returns { uploadId, uploadUrl }. PUT the raw video file bytes to uploadUrl (e.g. `curl -X PUT --upload-file video.mp4 '<uploadUrl>'` — no auth header needed, the URL is pre-signed). Then call viddler_videos_register with the uploadId to create the video record. Requires a videos:write token.
    Connector
  • Step 2 of uploading a video: after the file has been PUT to the uploadUrl, call this with the uploadId to create the video record. Returns the video (muxPlaybackId will be 'pending'). Poll viddler_videos_get until muxPlaybackId resolves — processing usually takes under a minute. If title/description are omitted, AI generates them from the video content.
    Connector
  • Return the full tela deck authoring guide as markdown — every tahta layout with its required/optional fields, the components, and the style variants. Read this FIRST when creating or editing a deck (a deck=true page) so you don't guess at layouts/fields. The guide lists optional capability modules (e.g. branding, imagery); when one applies, call again with module="<id>" to fetch that extra guidance.
    Connector
  • Get transcripts for a YouTube channel's most recent videos (newest first) as timestamped markdown, one section per video. Use for research across a creator's recent output; for one known video use get_transcript. Read-only; requires an API key. Charges 1 credit per video that returns a transcript, including repeat calls; videos without captions are skipped free. A 10-video call typically costs up to 10 credits, so start with a small limit. Rate limit: 5 requests per 10 seconds.
    Connector