Skip to main content
Glama
stabgan

OpenRouter MCP Multimodal Server

generate_video

Generate a video from a text description using AI. Supports async processing with optional image conditioning for first/last frames or style references.

Instructions

Generate a video from a text prompt using an OpenRouter video-generation model (default: google/veo-3.1). Submits an async job, polls until completion or max_wait_ms, then downloads the result. Optionally conditioned on first/last-frame images or reference images. Large outputs are auto-saved when save_path is provided and path-sandboxed.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
promptYesText description of the desired video.
modelNoOverride the video model ID.
resolutionNo480p / 720p / 1080p / 1K / 2K / 4K (model-dependent).
aspect_ratioNo16:9 / 9:16 / 1:1 / 4:3 / 3:4 / 21:9 / 9:21 (model-dependent).
durationNoDuration in seconds (model-dependent).
seedNoDeterministic seed when supported.
first_frame_imageNoOptional image (path, URL, or data URL) used as the first frame for image-to-video.
last_frame_imageNoOptional image used as the last frame for frame transitions.
reference_imagesNoOptional style/content reference images.
providerNoProvider-specific passthrough options keyed by provider slug.
save_pathNoWhere to save the video. Routed through the OPENROUTER_OUTPUT_DIR sandbox; extension auto-corrected.
max_wait_msNoTotal time to wait for the async job before returning a resumable handle (default 600000 ms).
poll_interval_msNoPolling cadence (default 15000 ms).
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond the annotations (which are all false), the description discloses async job submission, polling, timeout handling, auto-saving with path sandboxing, and image conditioning. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph of four sentences, efficiently covering the main action and key details. It could be slightly more structured with bullet points, but no information is wasted.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (13 parameters, async behavior, optional images, sandboxing), the description covers the essential workflow: async submission, polling, auto-save, and sandbox. It does not explain return values or error handling details, but no output schema exists.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description adds meaningful context: default model, model-dependent constraints on resolution/duration, and auto-corrected save path extension. This adds value beyond the schema alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it generates a video from a text prompt using an OpenRouter model, and it distinguishes itself from sibling tools like generate_audio and generate_image by specifying video generation with async polling and optional image conditioning.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for video generation but does not explicitly state when to use this tool vs alternatives (e.g., get_video_status for status checks, analyze_video for analysis). No 'when not to use' guidance is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/stabgan/openrouter-mcp-multimodal'

If you have feedback or need assistance with the MCP directory API, please join our Discord server