Skip to main content
Glama

generate_video

Generate or edit videos from text, images, or existing video clips. Use reference images for style and subject guidance.

Instructions

Generate or edit videos with Grok Imagine.

Text-to-video by default. Provide an image to animate (image-to-video), or
a source video to edit. Only one mode per call. Reference images can be
added to guide style and subjects. Generation polls synchronously (xAI's
default timeout is 10 minutes).

Args:
    prompt: Video description, or the edit instruction for video editing.
    model: Video model (default `grok-imagine-video`).
    image_path: Local image to use as the starting frame.
    image_url: Public image URL to use as the starting frame.
    video_path: Local video to edit (max 20 MB, .mp4, ≤ 8.7s).
    video_url: Public video URL to edit (.mp4, ≤ 8.7s).
    reference_image_paths: Local images used as style/subject references.
    reference_image_urls: Public image URLs used as style/subject references.
    duration: Video length in seconds (1–15, ignored when editing).
    aspect_ratio: Aspect ratio like `"16:9"` or `"9:16"` (ignored when editing).
    resolution: `"480p"` or `"720p"` (ignored when editing).

Returns:
    Markdown block with the generated video URL, actual duration, and a cost footer.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
promptYes
modelNogrok-imagine-video
image_pathNo
image_urlNo
video_pathNo
video_urlNo
reference_image_pathsNo
reference_image_urlsNo
durationNo
aspect_ratioNo
resolutionNo
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses that generation polls synchronously with a 10-minute timeout, and includes constraints like max video size (20 MB, .mp4, ≤8.7s) and duration limits (1–15 seconds). Since no annotations are provided, these details fully inform the agent about behavioral traits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured: an introductory sentence summarizing modes, a note on polling, then bullet-pointed Args and Returns. Every sentence adds value, and the most critical information (modes) is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 11 parameters and no output schema or annotations, the description covers modes, parameter constraints, return format (Markdown block with URL, duration, cost), and polling behavior. Minor missing details include not explicitly stating that only one of image_path/image_url/video_path/video_url should be provided per call, though 'Only one mode per call' implies it.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, but the description compensates excellently with a detailed Args section explaining each parameter's purpose, constraints (e.g., max 20 MB, .mp4, ≤8.7s for video_path/video_url), and which parameters are ignored in edit mode.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Generate or edit videos with Grok Imagine' and distinguishes between text-to-video, image-to-video, and video editing modes. It explicitly says 'Only one mode per call,' making the purpose specific and unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear guidance on when to use each mode (default text-to-video, provide image to animate, or source video to edit) and mentions reference images. However, it does not explicitly compare with sibling tools like extend_video or generate_image, nor does it state when not to use this tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/merterbak/Grok-MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server