Skip to main content
Glama

generate_video

Create videos from text prompts or images using AI models, supporting camera movement controls and various output resolutions.

Instructions

Generate a video from a prompt.

COST WARNING: This tool makes an API call to Minimax which may incur costs. Only use when explicitly requested by the user.

 Args:
    model (str, optional): The model to use. Values range ["T2V-01", "T2V-01-Director", "I2V-01", "I2V-01-Director", "I2V-01-live", "MiniMax-Hailuo-02"]. "Director" supports inserting instructions for camera movement control. "I2V" for image to video. "T2V" for text to video. "MiniMax-Hailuo-02" is the latest model with best effect, ultra-clear quality and precise response.
    prompt (str): The prompt to generate the video from. When use Director model, the prompt supports 15 Camera Movement Instructions (Enumerated Values)
        -Truck: [Truck left], [Truck right]
        -Pan: [Pan left], [Pan right]
        -Push: [Push in], [Pull out]
        -Pedestal: [Pedestal up], [Pedestal down]
        -Tilt: [Tilt up], [Tilt down]
        -Zoom: [Zoom in], [Zoom out]
        -Shake: [Shake]
        -Follow: [Tracking shot]
        -Static: [Static shot]
    first_frame_image (str): The first frame image. The model must be "I2V" Series.
    duration (int, optional): The duration of the video. The model must be "MiniMax-Hailuo-02". Values can be 6 and 10.
    resolution (str, optional): The resolution of the video. The model must be "MiniMax-Hailuo-02". Values range ["768P", "1080P"]
    output_directory (str): The directory to save the video to.
    async_mode (bool, optional): Whether to use async mode. Defaults to False. If True, the video generation task will be submitted asynchronously and the response will return a task_id. Should use `query_video_generation` tool to check the status of the task and get the result.
Returns:
    Text content with the path to the output video file.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
modelNoT2V-01
promptNo
first_frame_imageNo
durationNo
resolutionNo
output_directoryNo
async_modeNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and delivers substantial behavioral context. It discloses cost implications (API call to Minimax with potential costs), async mode behavior (returns task_id, requires separate query tool), and output format (path to video file). It could improve by mentioning rate limits or error handling.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately front-loaded with core purpose and critical warnings, but the parameter documentation is verbose with detailed enum values and camera instructions that might belong better in schema descriptions. Some sentences could be more concise while maintaining clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex 7-parameter tool with no annotations and no output schema, the description provides substantial context including behavioral traits, parameter semantics, and usage constraints. It explains the return value format. Minor gaps include lack of error conditions, rate limits, or authentication requirements.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage for 7 parameters, the description fully compensates by providing rich semantic details for all parameters. It explains model types and their purposes (T2V, I2V, Director variants), prompt constraints with camera instructions, parameter dependencies (first_frame_image requires I2V models, duration/resolution require MiniMax-Hailuo-02), and async_mode implications.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with a specific verb ('generate') and resource ('video from a prompt'), distinguishing it from sibling tools like text_to_image or text_to_audio. It explicitly identifies the core functionality of video generation from textual or image inputs.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear usage context with the cost warning and explicit instruction to 'only use when explicitly requested by the user.' It mentions the alternative tool 'query_video_generation' for async mode but doesn't differentiate from other video-related siblings (none exist) or explain when to choose specific models beyond their capabilities.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/swesmith-repos/MiniMax-AI__MiniMax-MCP.aa97ac39'

If you have feedback or need assistance with the MCP directory API, please join our Discord server