Skip to main content
Glama
226,397 tools. Last updated 2026-06-23 02:15

"A server for video editing and understanding video content" matching MCP tools:

  • Create a new Avocado AI Flow pre-built with a node-graph pipeline, and return its id and direct URL so the user can open it on the canvas. You design the whole pipeline: pass the nodes and edges and the server validates socket compatibility, aligns video models to the input shape, lays the graph out left-to-right, and adds a caption per step. Edges reference nodes by 0-based index in the `nodes` array. This creates (does not run) the flow — the user runs it from the editor. Use the capability map below to choose node types, models, and handles: You are Avo, a senior creative-workflow designer inside Avocado AI's Flow editor. The user describes a creative goal; you respond with a node-graph proposal that the editor previews on the canvas. Think like a production director: design the FULL pipeline needed to get a polished result, not the minimum number of nodes. DESIGN PRINCIPLES — build capable, complete pipelines: - Match the pipeline's ambition to the request. A throwaway test is 2-3 nodes; a real deliverable (an ad, a UGC video, a product shot, a music video) is usually 5-12 nodes. Use up to 24 when it genuinely helps. - Prefer multi-stage quality: generate → refine (imageEditor) → upscale → animate, rather than a single generate node. Add an upscale step before any final image/video deliverable. - Use BRANCHING and FAN-OUT. One output can feed many nodes: e.g. one hero image → three different video models for variations the user can pick from; one script → both a voiceover and the video prompt. - Use PARALLEL TRACKS that converge: e.g. a voice track and an image track both feeding a lip-sync video; or a music track plus a visuals track. - Use the `llm` node to do creative thinking inside the graph — write or expand a script, brainstorm a prompt, turn a rough idea into a detailed image/video prompt — then wire its text output into the next node. - Pick the BEST model for each step (see the menus below). Don't leave everything on defaults — choosing models is a big part of the value. - Set per-node settings (aspect ratio, resolution, duration, voice, variations) when the request implies them (e.g. 'vertical' → 9:16, 'short' → duration 5, '3 options' → variations 3 or three branches). HARD RULES: - Use only the node types listed below. Never invent new ones. - Every edge must connect compatible socket types (text→text, image→image, audio→audio, video→video). - Give every runnable node a short `stepLabel` ('Step N — …') — it renders as a caption beneath that node. - `stickyNote` is only for standalone notes; never use it to caption a node (use `stepLabel`). Optionally add ONE stickyNote describing the workflow. - Any schema field you don't need must be `null` (numbers like `variations` too). MODEL MENUS (set the node's `model` to one of these ids): image (text-to-image) — `model` ids: • fal-ai/nano-banana-2 — fast, strong all-rounder (default) • fal-ai/gpt-image-2 — best instruction-following & legible text • fal-ai/bytedance/seedream/v5/lite/text-to-image — photoreal • fal-ai/flux-pro/v1.1-ultra — high detail / fidelity • fal-ai/nano-banana-pro — premium quality • fal-ai/recraft/v4/text-to-image — design, brand, vector-style • fal-ai/ideogram/v3 — posters & typography imageEditor (image + prompt → edited image) — `model` ids: • fal-ai/nano-banana-2/edit — default, multi-image (up to 14 inputs) • openai/gpt-image-2/edit — precise instruction edits • fal-ai/bytedance/seedream/v5/lite/edit — photoreal edits • fal-ai/flux-pro/kontext/max/text-to-image — style / context transfer • fal-ai/gemini-25-flash-image/edit — fast edits (the `image` input accepts MULTIPLE connections for compositing/restyle) imageUpscale (image → larger image) — `model` ids: • fal-ai/topaz/upscale/image — best quality (default) • fal-ai/recraft-crisp-upscale, fal-ai/clarity-upscaler, fal-ai/crystal-upscaler llm (text → text) — `model` ids: claude-haiku (default), gpt-4o-mini, kimi-k2, seed-1.8. Put the instruction in `prompt`. voice (text → speech) — pick a `voice` by name: Sarah (cheerful), Roger (deep), Laura (soft), Charlie (warm), George (bold), Callum (energetic), River (calm), Liam (reliable). The script comes from an upstream text/llm node wired into `in` — do NOT put the script in the voice node's prompt. music (text → music) — set `duration` to one of 30,60,90,120,180,240,300 (seconds). Put the music description in `prompt`. videoUpscale (video → sharper video) — add after a video node for final deliverables. No model field. VIDEO node — choose `model` to match the input shape (it drives which input handles the node renders): • Text → video: `kling3-pro`, `sora-2`, `veo3-1-fast`, `seedance-2.0-t2v`. Wire text to `prompt`. • Image → video (I2V): `veo3-1-fast`, `kling3-pro`, `seedance-2.0-i2v`, `hailuo-pro`. Wire the image to `image`. For keyframe models (`kling-o1`, `veo3-1`) wire `start-frame` + `end-frame`. • Lip-sync / talking-head: `fabric` (image + audio, NO prompt — never wire text into Fabric) or `infinitalk` (prompt + image + audio). Wire audio to `audio`. Audio-over-stills narration: `ltx2-audio`. • Multi-image reference / character consistency: `vidu` (≤7), `veo3-1-ref` (≤10), `kling-elements` (2-4 ordered frames), `happy-horse-ref` (≤9). Wire EACH image to the SAME `ref-images` handle (it accepts multiple connections). Never use the plain `image` handle. • Seedance reference (image + video + audio refs): `seedance-2.0-ref` / `seedance-2.0-ref-fast`. Wire to `ref-images` / `ref-videos` / `ref-audio`. • Motion control (drive a character with a motion video): `kling3-motion-control`. Wire character to `image`, motion clip (videoUpload) to `motion-video`. Edge handle hints: - When the target has multiple typed inputs (Video, Image Editor), set `toHandle` explicitly (`prompt`, `image`, `audio`, `ref-images`, `start-frame`, `end-frame`, `motion-video`). The editor otherwise picks the first type-compatible handle, which may be the wrong slot. - Never wire text into Fabric. Never wire a single image into a multi-ref model's `image` slot — use `ref-images`. Available node types (id — purpose — inputs / outputs): - text — Prompt — in: in<text> | out: out<text> - llm — LLM — in: in<text> | out: out<text> - upload — Upload — in: — | out: out<image> - videoUpload — Video Upload — in: — | out: out<video> - image — Image — in: in<text> | out: out<image> - imageEditor — Image Editor — in: prompt<text>, image<image> | out: out<image> - imageUpscale — Image Upscale — in: image<image> | out: out<image> - video — Video — in: prompt<text>, image<image>, start-frame<image>, end-frame<image>, ref-images<image>, ref-videos<video>, ref-audio<audio>, audio<audio>, motion-video<video> | out: out<video> - videoUpscale — Video Upscale — in: video<video> | out: out<video> - voice — Voice — in: in<text> | out: out<audio> - music — Music — in: in<text> | out: out<audio> - stickyNote — Sticky Note — in: in<annotation> | out: out<annotation> Edges reference nodes by index in the `nodes` array (0-based). In the examples below, any field not shown is `null`. EXAMPLES — study the PATTERNS (multi-stage, fan-out, parallel tracks), copy the handle names exactly: Example 1 — UGC talking-head with scripted voice + final upscale: nodes=[ {type:"llm",stepLabel:"Step 1 — Write a punchy 15s script",prompt:"Write a 15-second energetic UGC script for the product.",model:"claude-haiku"}, {type:"voice",stepLabel:"Step 2 — Voiceover",voice:"George"}, {type:"upload",stepLabel:"Step 3 — Upload character photo"}, {type:"video",stepLabel:"Step 4 — Lip-sync video",model:"fabric"}, {type:"videoUpscale",stepLabel:"Step 5 — Upscale to deliver"} ] edges=[ {fromIndex:0,toIndex:1,fromHandle:"out",toHandle:"in"}, {fromIndex:1,toIndex:3,fromHandle:"out",toHandle:"audio"}, {fromIndex:2,toIndex:3,fromHandle:"out",toHandle:"image"}, {fromIndex:3,toIndex:4,fromHandle:"out",toHandle:"video"} ] Example 2 — Text → image → refine → upscale (quality chain): nodes=[ {type:"text",stepLabel:"Step 1 — Prompt",prompt:"A cinematic product shot of a matte-black bottle on wet stone, golden hour"}, {type:"image",stepLabel:"Step 2 — Generate hero",model:"fal-ai/flux-pro/v1.1-ultra",aspectRatio:"4:3"}, {type:"imageEditor",stepLabel:"Step 3 — Add brand label",prompt:"Add a minimal embossed logo on the bottle",model:"fal-ai/nano-banana-2/edit"}, {type:"imageUpscale",stepLabel:"Step 4 — Upscale",model:"fal-ai/topaz/upscale/image"} ] edges=[ {fromIndex:0,toIndex:1,fromHandle:"out",toHandle:"in"}, {fromIndex:1,toIndex:2,fromHandle:"out",toHandle:"image"}, {fromIndex:2,toIndex:3,fromHandle:"out",toHandle:"image"} ] Example 3 — Fan-out: one image → three video variations (different models): nodes=[ {type:"upload",stepLabel:"Step 1 — Source image"}, {type:"text",stepLabel:"Step 2 — Motion brief",prompt:"Slow cinematic push-in, gentle parallax"}, {type:"video",stepLabel:"Variation A — Veo",model:"veo3-1-fast",aspectRatio:"9:16",duration:"5"}, {type:"video",stepLabel:"Variation B — Kling",model:"kling3-pro",aspectRatio:"9:16",duration:"5"}, {type:"video",stepLabel:"Variation C — Seedance",model:"seedance-2.0-i2v",aspectRatio:"9:16",duration:"5"} ] edges=[ {fromIndex:0,toIndex:2,fromHandle:"out",toHandle:"image"}, {fromIndex:0,toIndex:3,fromHandle:"out",toHandle:"image"}, {fromIndex:0,toIndex:4,fromHandle:"out",toHandle:"image"}, {fromIndex:1,toIndex:2,fromHandle:"out",toHandle:"prompt"}, {fromIndex:1,toIndex:3,fromHandle:"out",toHandle:"prompt"}, {fromIndex:1,toIndex:4,fromHandle:"out",toHandle:"prompt"} ] Example 4 — Multi-image reference video (character consistency): nodes=[ {type:"upload",stepLabel:"Ref 1 — Character front"}, {type:"upload",stepLabel:"Ref 2 — Character side"}, {type:"upload",stepLabel:"Ref 3 — Outfit detail"}, {type:"text",stepLabel:"Scene prompt",prompt:"The character walks through a neon market at night"}, {type:"video",stepLabel:"Generate with refs",model:"veo3-1-ref",aspectRatio:"16:9"} ] edges=[ {fromIndex:0,toIndex:4,fromHandle:"out",toHandle:"ref-images"}, {fromIndex:1,toIndex:4,fromHandle:"out",toHandle:"ref-images"}, {fromIndex:2,toIndex:4,fromHandle:"out",toHandle:"ref-images"}, {fromIndex:3,toIndex:4,fromHandle:"out",toHandle:"prompt"} ] Example 5 — Music video: parallel music + visuals tracks converging: nodes=[ {type:"music",stepLabel:"Track 1 — Score",prompt:"Dreamy lo-fi beat, 90 BPM",duration:"60"}, {type:"text",stepLabel:"Track 2 — Scene",prompt:"A lone astronaut drifting past a glowing planet"}, {type:"image",stepLabel:"Keyframe",model:"fal-ai/nano-banana-pro",aspectRatio:"16:9"}, {type:"video",stepLabel:"Animate",model:"ltx2-audio",aspectRatio:"16:9"} ] edges=[ {fromIndex:1,toIndex:2,fromHandle:"out",toHandle:"in"}, {fromIndex:2,toIndex:3,fromHandle:"out",toHandle:"image"}, {fromIndex:0,toIndex:3,fromHandle:"out",toHandle:"audio"} ] Return only the structured object — no prose, no markdown.
    Connector
  • Check the status of a transcribe or summarize job. Returns the current state and, when completed, an `outputs` array. Each output has either `content` (returned inline) or a presigned, time-limited (1 hour) `download_url`. Small text outputs (e.g. `transcript` SRT, `clip-candidates`, `summary`) come inline as `content`; larger outputs — `transcript-words` JSON for any non-trivial recording, plus video outputs like `clip-video` / `clip-vertical-video` — come as a `download_url` to fetch when needed. Optionally pass `format` (srt, txt, vtt, json, words) to get the transcript content inline in the top-level `transcript` field — `txt` and `vtt` are derived from the stored SRT; `json` is v1 (segments only); `words` is v2 (segments + per-word timestamps matching /.well-known/weftly-transcript-v2.schema.json). Poll this periodically after calling complete_upload — wait at least 60 seconds between checks. For files under 10 minutes, jobs usually complete within 1-2 minutes. For long files (1hr+), expect 10-30 minutes. Also use this to recover from lost state: if the original challenge was lost, call get_job_status(job_id) to retrieve a fresh challenge (status "awaiting_payment") or the upload URL (status "awaiting_upload").
    Connector
  • Search podcasts (shows) or episodes from the open Podcast Index. Use when the user mentions a podcast, podcast host, audio show, or asks about a topic where podcast content adds value alongside video. type=podcast returns shows; type=episode returns recent episodes for the top-matching show and includes the RSS-declared transcript URL when the feed exposes one. Costs 1 credit.
    Connector
  • Download a video or audio file from any supported platform: YouTube, TikTok, Vimeo, Dailymotion, Twitter/X, SoundCloud, Bandcamp, Mixcloud, Twitch (clips and VODs), or Streamable. Output is MP4 (video, default) or MP3 / M4A (audio). This is THE tool to use whenever a user asks to save, download, rip, extract, archive, get offline, or convert a video/audio link from any of these sites. IMPORTANT: the `format` argument defaults to `mp4` (video). Only pass an audio format (mp3 / m4a / audio) when the user explicitly says audio, MP3, music, song, or "rip / extract the audio". Audio-only platforms (SoundCloud, Bandcamp, Mixcloud) always produce audio regardless of `format`. Use this tool when the user says things like: - "download this video" / "download this TikTok" / "save this SoundCloud track" - "save that as MP3" / "rip the audio" / "extract the audio" - "get the song from this SoundCloud link" / "save this Mixcloud set" - "convert this YouTube video to MP4" / "download in 1080p" - "save this lecture/podcast/talk for offline" - "archive this clip" / "grab a copy of this video" - any sentence containing a youtube.com, youtu.be, tiktok.com, vimeo.com, dailymotion.com, twitter.com, x.com, soundcloud.com, bandcamp.com, mixcloud.com, twitch.tv, clips.twitch.tv, or streamable.com URL plus a verb like download, save, rip, get, grab, fetch, pull, archive, convert, extract. Do NOT use this tool when: - The user only wants metadata (title, length, description, channel) — call get_video_info instead, it is free and does not consume the user quota. - The link is a playlist / set / album / channel URL — ask the user for a single track/video. - The link is from a platform not in the supported list above (e.g. Instagram, Facebook, LinkedIn). Returns a one-time signed download link valid for 1 hour, plus the file size, duration, and chosen format. Hand the link back to the user verbatim; do not try to fetch its contents yourself. Intended for legitimate uses: the user's own uploads, Creative Commons / public-domain content, lectures, podcasts, talks, and other material they have rights to use.
    Connector
  • Generate an AI video. Thirteen models: seedance-2.0-t2v / -t2v-fast (text only), seedance-2.0-i2v / -i2v-fast (REQUIRE an image), seedance-2.0-ref / -ref-fast (REFERENCE-to-video: locks character/style across generations from reference images — pass reference_image_urls and/or reference_file_ids; ideal for keeping a Storyboard Studio character consistent), kling3-standard (720p, 5-10s), kling3-pro (1080p, 5-10s), kling3-4k & kling-o3-4k (4K, 3-15s; all four Kling 3.x variants support BOTH text-to-video and image-to-video — supplying image_url or file_id automatically picks image mode), grok-imagine-video-v1-5 (480p/720p, 1-15s, REQUIRES an image — image-to-video only), happy-horse-t2v (Happy Horse text-to-video, 720p/1080p, 3-15s, with native audio + lip-sync), happy-horse-i2v (Happy Horse image-to-video, REQUIRES an image, 720p/1080p, 3-15s). For image-to-video on any host: call prepare_image_upload first, then pass the returned file_id here. Renders take 2-10 minutes; the inline result card polls for completion. Pricing is per-second, varies by model and resolution.
    Connector
  • Read **text content** of an attached file. Works for: .txt, .md, .json, code files, and PDFs (after files.ingest extracts text). DO NOT call on binary files — for IMAGES use `files.get_base64`, for AUDIO/VIDEO it cannot be transcribed via this tool, and for non-PDF DOCUMENTS run `files.ingest` first, THEN files.read. Calling on a binary mime-type returns an error — saves you a turn to read the routing hint before deciding.
    Connector

Matching MCP Servers

Matching MCP Connectors

  • AI-powered video publishing, channel management, and monetization via open.video

  • Create and manage cinematic AI video renders through the Future Video Studio Agent API.

  • Read **text content** of an attached file. Works for: .txt, .md, .json, code files, and PDFs (after files.ingest extracts text). DO NOT call on binary files — for IMAGES use `files.get_base64`, for AUDIO/VIDEO it cannot be transcribed via this tool, and for non-PDF DOCUMENTS run `files.ingest` first, THEN files.read. Calling on a binary mime-type returns an error — saves you a turn to read the routing hint before deciding.
    Connector
  • Get a presigned PUT URL to upload any file — video, audio, or document (markdown, HTML, DOCX, etc.). The URL expires in 15 minutes. PUT raw file bytes directly to the URL. After upload, pass the object_key to transcode_video (for video) or convert_file (for documents). IMPORTANT: this flow needs direct outbound network access to Botverse's S3 bucket. In sandboxed agent environments (claude.ai, sandboxed desktop apps, Cursor) that route traffic through a proxy allowlist, the PUT is blocked and the upload fails. In those environments do NOT use this tool — use convert_content or transcode_content (inline content, body under 500 KB) for files you already have, or convert_from_url / transcode_from_url for anything available at a public URL. Neither needs an upload step.
    Connector
  • Upload one image into the user's Switch library in a single call. Pass `url` (any public https) OR `base64` + `mime`. Switch fetches/decodes it server-side, stores it, and returns a clean public URL plus the new asset id. This is THE way to use a photo the user attached in chat as a reference: pass the returned `url` directly into generate_image's reference_image_urls, OR into generate_video's image_url (image-to-video) or reference_image_urls (reference / omni video). The returned URL is provider-fetchable as-is — no presigned PUT, no curl, no confirm-upload step. Do NOT call get_my_active_references for a chat-attached photo; that strip only holds Studio-managed refs.
    Connector
  • Transcribe audio or video to text, including per-word timestamps for precise editing. Three-call flow: (1) call with `filename` to receive {job_id, payment_challenge}; (2) pay via MPP, then call with `job_id` + `payment_credential` to receive {upload_url} (presigned PUT, 1h expiry); (3) PUT the bytes, then complete_upload(job_id), then poll get_job_status(job_id). On completion, get_job_status returns two outputs: role `transcript` (SRT) and role `transcript-words` (JSON matching /.well-known/weftly-transcript-v2.schema.json, with segment-level and per-word timestamps). For other formats, pass `format=srt|txt|vtt|json|words` to get_job_status to receive content inline — `txt` and `vtt` are derived from SRT, `json` is v1 (segments only), `words` is v2 (segments + words). Flat price: audio $0.50, video $1.00 — see /.well-known/mpp.json for the authoritative table. Use for podcasts, interviews, meetings, lectures, and especially for creating clips, multicamera edits, or edit-video-from-transcript where word boundaries matter. Retrying any call with `job_id` alone returns current state (idempotent). Failed jobs auto-refund.
    Connector
  • Ask a question about one or more videos with visual analysis. Most effective on focused time ranges — use start/end to specify the segment to analyze. BEFORE calling this tool, read the reka://docs/guide resource for recommended workflows. In most cases, you should first: - search_videos to find WHEN something happens, then pass those timestamps here as start/end - segment_video to detect and locate specific objects - get_transcript to read what was said For single-video questions, pass video_id with start/end. For cross-video questions, pass videos — a list of video references with start/end each. For follow-up questions, pass conversation_id from the previous response. You can add start/end to drill into a specific moment while keeping the conversation context. Requires qa_only or full pipeline.
    Connector
  • Generate a short video (5-10s) from a text prompt using BytePlus Seedance. Optionally accepts up to 12 image file IDs from the user's attached files (visible in the [ATTACHMENTS] block) as `reference_file_ids` for style and composition. Returns immediately with a job_id; the video is delivered back via continuation when the job completes (~30-90s for fast model, ~2-5min for pro). Reference images are temporarily re-hosted on a third-party CDN (imgbb) for the duration of generation and deleted on completion — don't submit confidential references. Gated behind a workspace opt-in flag.
    Connector
  • Browse the Gapup gold-standard content catalogue — video games, films, TV series and music. Returns franchises with their works (title, release year). When to use this tool: an agent needs structured, audited metadata for a cultural franchise, wants to resolve a title to a canonical entity, or browses a domain's catalogue before requesting enrichment. Inputs: a content domain and an optional case-insensitive name filter. Each franchise id can be passed to content_enrichment for its fine-grained tag profile.
    Connector
  • Upscales a source video to 1080p or 2K using Atlas. Pass a public `videoUrl` and the target resolution. Cost is per-second (7 cr/s @ 1080p, 9 cr/s @ 2K). Atlas-side limits: clips up to 53s at 1080p, 23s at 2K, source must be <=30fps. Returns the upscaled video URL (R2-hosted).
    Connector
  • Index a video for search, QA, or full analysis. Processes the video through a pipeline of AI features. Typically takes 3-7 minutes; longer for long videos or the 'full' pipeline. Times out after 10 minutes by default. Pipelines: - search_only: transcription + captions + embeddings (enables search_videos) - qa_only: transcription + captions (enables ask_video) - full: transcription + captions + embeddings (enables all tools) Scene detection is enabled by default and produces scene boundaries for get_scenes. Pass scene_detection=False to skip it. Prerequisites: if using video_id, the video must be in 'uploaded' status. Use get_video to check status before calling this tool.
    Connector
  • Fetch metadata about a video or audio track WITHOUT downloading it. Works on every platform download_video supports: YouTube, TikTok, Vimeo, Dailymotion, Twitter/X, SoundCloud, Bandcamp, Mixcloud, Twitch, and Streamable. Returns title, uploader/channel name, duration, view count (when available), upload date, thumbnail URL, description, available video qualities, and (for YouTube) the license type. Use this tool when the user says things like: - "what is this video about" / "summarize this video" - "how long is this track" / "when was this uploaded" - "who made this" / "what channel/artist is this from" - "is this Creative Commons" / "can I reuse this" / "what is the license" - "what qualities are available for this video" Do NOT use this tool when: - The user wants to download, save, rip, extract, or convert the video/audio — use download_video for that. Free to call — does not count against the user's download quota. Call this before download_video when you need to confirm the video exists, pick the right quality, or check licensing before downloading.
    Connector
  • Submit a video generation task Submit an asynchronous video generation task. Starts a Seedance generation job — text-to-video, image-to-video, or video-to-video depending on `content` — and returns a `taskId` immediately; the video is produced in the background. Poll `GET /openapi/v2/model/video/tasks/{task_id}` with the returned id until `status` is terminal to obtain the video URL. Available to CONTRACT-tier API keys only. The task cost is charged in USD from the account wallet on the first successful poll, not at submit time. At submit time the wallet balance is checked against an upper-bound cost estimate; an insufficient balance returns 402 with `X-Usd-Required-*` headers and no task is created. Request body: - `model` (string, required): `seedance-2.0` or `seedance-2.0-fast`. - `content` (array, required, >= 1 item): generation inputs as an OpenAI-style content array. Must include at least one text item `{"type": "text", "text": "<prompt>"}`. May also include reference media items such as `{"type": "image_url", "image_url": {"url": "https://..."}, "role": "reference_image"}` (and likewise `video_url` / `audio_url`), capped at 9 image, 3 video, and 3 audio items. `role` is one of `first_frame`, `last_frame`, `reference_image`, `reference_video`, `reference_audio`. Reference URLs must be publicly reachable. - `resolution` (string, optional, default `720p`): `480p`, `720p`, or `1080p`. `1080p` is not supported by `seedance-2.0-fast`. Also the billing tier. - `duration` (integer, required): output length in seconds, 4-15. - `ratio` (string, optional): output aspect ratio — `21:9`, `16:9`, `4:3`, `1:1`, `3:4`, `9:16`, or `adaptive`. - `generate_audio` (boolean, optional): generate an audio track. - `watermark` (boolean, optional): overlay the provider watermark. - `service_tier` (string, optional): `flex` for cheaper offline inference. - `return_last_frame` (boolean, optional): also return the video's last frame on `output`. Any further unrecognized top-level fields are forwarded to the generation provider unchanged. Response `data`: - `taskId` (string): identifier to poll, format `task_video_<id>`. - `status` (string): always `pending` immediately after submit. ### Responses: **200**: Successful Response (Success Response) Content-Type: application/json **Example Response:** ```json { "success": true, "meta": { "requestId": "Requestid", "timestamp": "Timestamp" } } ``` **Output Schema:** ```json { "properties": { "success": { "type": "boolean", "title": "Success", "description": "Whether the request was successful", "default": true }, "data": { "description": "Response data payload" }, "error": { "description": "Error details if request failed" }, "meta": { "description": "Metadata for API responses.\n\nCredit fields follow the ADR-0003 parallel-fields strategy (Option 3):\n- `credits_remaining` / `credits_consumed` (int): legacy fields, rounded\n to whole credits, kept for zero-breaking-change to existing SDK clients.\n- `credits_remaining_exact` / `credits_consumed_exact` (float): new\n precision-aware fields for clients that opt in to decimal credits.\n\nSee ADR-0003 decision 5 and the \u00a78 deprecation timeline.\n\nTODO(2026-11, ADR-0003 \u00a78 +6mo): mark `credits_remaining` /\n`credits_consumed` as `deprecated=True` in their Field() definitions\nand announce in customer changelog.\nTODO(2027-05, ADR-0003 \u00a78 +12mo): remove the legacy int fields via a\nmajor-version bump of the OpenAPI surface.", "properties": { "requestId": { "type": "string", "title": "Requestid", "description": "Unique request identifier" }, "timestamp": { "type": "string", "title": "Timestamp", "description": "Response timestamp in ISO 8601 format" }, "total": { "title": "Total", "description": "Total number of records" }, "page": { "title": "Page", "description": "Current page number" }, "pageSize": { "title": "Pagesize", "description": "Number of records per page" }, "totalPages": { "title": "Totalpages", "description": "Total number of pages" }, "creditsRemaining": { "title": "Creditsremaining", "description": "Remaining API credits (rounded to whole credits; see creditsRemainingExact for precise value)" }, "creditsConsumed": { "title": "Creditsconsumed", "description": "Credits consumed by this request (rounded; see creditsConsumedExact for precise value)" }, "creditsRemainingExact": { "title": "Creditsremainingexact", "description": "Remaining API credits, precise to 1 decimal place" }, "creditsConsumedExact": { "title": "Creditsconsumedexact", "description": "Credits consumed by this request, precise to 1 decimal place" }, "tokensUsage": { "description": "Provider token-usage block \u2014 populated on terminal video polls only, null on every non-video endpoint. See TokensUsage for its fields." } }, "type": "object", "required": [ "requestId", "timestamp" ], "title": "ResponseMeta" } }, "type": "object", "required": [ "meta" ], "title": "OpenApiResponse[VideoGenerationSubmitData]", "examples": [] } ``` **422**: Validation Error Content-Type: application/json **Example Response:** ```json { "detail": [ { "loc": [], "msg": "Message", "type": "Error Type", "ctx": {} } ] } ``` **Output Schema:** ```json { "properties": { "detail": { "items": { "properties": { "loc": { "items": {}, "type": "array", "title": "Location" }, "msg": { "type": "string", "title": "Message" }, "type": { "type": "string", "title": "Error Type" }, "input": { "title": "Input" }, "ctx": { "type": "object", "title": "Context" } }, "type": "object", "required": [ "loc", "msg", "type" ], "title": "ValidationError" }, "type": "array", "title": "Detail" } }, "type": "object", "title": "HTTPValidationError" } ```
    Connector
  • Read the user's staged references in Switch Studio. Returns TWO groups: (1) the image-generation reference strip (typed face/body/outfit/scenery/product slots) under `refs`, and (2) the VIDEO-tab references the user staged in the Omni/Image video tabs (the @Image1/@Image2 strip) under `videoReferences`, with usable signed URLs. Call this before generate_image or generate_video whenever the user says "use my refs" or refers to images they staged in Studio (including "the images in my video tab"). To make a video from the video-tab refs, pass videoReferences.imageUrls into generate_video reference_image_urls (and videoUrls into reference_video_urls) in reference-to-video / omni mode. Refs marked alive:false are dead (stored file gone) and are already excluded from the usable url lists. NOTE: a photo the user just attached in THIS chat is in neither group — for that, call upload_media and use its returned url/asset id directly.
    Connector
  • Fetch a YouTube video transcript/subtitles from a video URL or 11-char id. Default format='text' returns the transcript inline (when it fits ~80K chars / ~20K tokens) so a single call gives you the text directly; long-form videos fall back to a download_url note. Pass format='json' for structured metadata + a presigned download_url (no inline transcript) - for batch/programmatic use. Default origin='uploader_provided' (human captions); falls back to 'auto_generated' automatically if missing (counts as 2 upstream calls). Cached 7 days server-side.
    Connector
  • FREE triage tool — send whatever context you have (message content, sender info, URLs, attachments, draft replies, thread messages, image/video URLs) and get back a prioritized list of which security tools to run. No AI call, no charge, instant response. Always call this first to get the best security coverage.
    Connector