Skip to main content
Glama
RamboRogers

FAL Image/Video MCP Server

by RamboRogers

pixverse_image

Transform static images into animated videos by providing a motion description prompt. Specify duration, aspect ratio, and control elements to avoid for customized video generation.

Instructions

Pixverse V4.5 I2V - Advanced image-to-video

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
image_urlYesURL of the input image
promptYesMotion description prompt
durationNoVideo duration in seconds5
aspect_ratioNo16:9
negative_promptNoWhat to avoid in the video
cfg_scaleNoHow closely to follow the prompt

Implementation Reference

  • src/index.ts:122-122 (registration)
    Registration of the 'pixverse_image' tool in the MODEL_REGISTRY.imageToVideo array, defining its ID, endpoint, name, and description.
    { id: 'pixverse_image', endpoint: 'fal-ai/pixverse/v4.5/image-to-video', name: 'Pixverse V4.5 I2V', description: 'Advanced image-to-video' },
  • Dynamic generation of input schema for all image-to-video tools, including pixverse_image, defining parameters like image_url, prompt, duration, etc.
    } else if (category === 'imageToVideo') {
      baseSchema.inputSchema.properties = {
        image_url: { type: 'string', description: 'URL of the input image' },
        prompt: { type: 'string', description: 'Motion description prompt' },
        duration: { type: 'string', enum: ['5', '10'], default: '5', description: 'Video duration in seconds' },
        aspect_ratio: { type: 'string', enum: ['16:9', '9:16', '1:1'], default: '16:9' },
        negative_prompt: { type: 'string', description: 'What to avoid in the video' },
        cfg_scale: { type: 'number', default: 0.5, minimum: 0, maximum: 1, description: 'How closely to follow the prompt' }
      };
      baseSchema.inputSchema.required = ['image_url', 'prompt'];
    }
  • Core handler function that executes the pixverse_image tool logic: destructures arguments, prepares input params, calls fal.subscribe on the model endpoint, processes video output with downloads/data URLs, and returns structured content.
    private async handleImageToVideo(args: any, model: any) {
      const { 
        image_url, 
        prompt, 
        duration = '5', 
        aspect_ratio = '16:9',
        negative_prompt,
        cfg_scale
      } = args;
    
      try {
        // Configure FAL client lazily with query config override
        configureFalClient(this.currentQueryConfig);
        const inputParams: any = { image_url, prompt };
        
        // Add optional parameters
        if (duration) inputParams.duration = duration;
        if (aspect_ratio) inputParams.aspect_ratio = aspect_ratio;
        if (negative_prompt) inputParams.negative_prompt = negative_prompt;
        if (cfg_scale !== undefined) inputParams.cfg_scale = cfg_scale;
    
        const result = await fal.subscribe(model.endpoint, { input: inputParams });
        const videoData = result.data as FalVideoResult;
        const videoProcessed = await downloadAndProcessVideo(videoData.video.url, model.id);
    
        return {
          content: [
            {
              type: 'text',
              text: JSON.stringify({
                model: model.name,
                id: model.id,
                endpoint: model.endpoint,
                input_image: image_url,
                prompt,
                video: {
                  url: videoData.video.url,
                  localPath: videoProcessed.localPath,
                  ...(videoProcessed.dataUrl && { dataUrl: videoProcessed.dataUrl }),
                  width: videoData.video.width,
                  height: videoData.video.height,
                },
                metadata: inputParams,
                download_path: DOWNLOAD_PATH,
                data_url_settings: {
                  enabled: ENABLE_DATA_URLS,
                  max_size_mb: Math.round(MAX_DATA_URL_SIZE / 1024 / 1024),
                },
                autoopen_settings: {
                  enabled: AUTOOPEN,
                  note: AUTOOPEN ? "Files automatically opened with default application" : "Auto-open disabled"
                },
              }, null, 2),
            },
          ],
        };
      } catch (error) {
        throw new Error(`${model.name} generation failed: ${error}`);
      }
    }
  • Dispatch logic in CallToolRequestSchema handler that routes calls to pixverse_image (and other imageToVideo tools) to the handleImageToVideo function.
    } else if (MODEL_REGISTRY.imageToVideo.find(m => m.id === name)) {
      return await this.handleImageToVideo(args, model);
    }
  • Helper function to retrieve the model configuration (endpoint, name, etc.) by tool ID, used before dispatching to handlers.
    function getModelById(id: string) {
      const allModels = getAllModels();
      return allModels.find(model => model.id === id);
    }
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It mentions 'Advanced' but gives no concrete behavioral details: no information about processing time, rate limits, authentication requirements, output format, quality characteristics, or what makes it 'advanced' compared to basic image-to-video tools.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise at just 5 words, front-loaded with the core functionality. Every word earns its place: 'Pixverse V4.5 I2V' identifies the model, 'Advanced' suggests quality, and 'image-to-video' states the core function with zero waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex 6-parameter video generation tool with no annotations and no output schema, the description is inadequate. It doesn't explain what the tool returns, processing characteristics, quality expectations, or how it differs from other video generation tools in the sibling list, leaving significant gaps for agent understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 83% schema description coverage, the baseline is 3. The description adds no parameter information beyond what's in the schema - it doesn't explain relationships between parameters, provide usage examples, or clarify concepts like 'cfg_scale' beyond the schema's 'How closely to follow the prompt' description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's function as 'Advanced image-to-video' with 'Pixverse V4.5 I2V' specifying the model version. It identifies the verb (image-to-video conversion) and resource (image input), but doesn't differentiate from sibling tools like 'pixverse_text' or other video generation tools in the list.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. With many sibling tools for image/video generation (e.g., pixverse_text, luma_ray2_image, vidu_image), there's no indication of this tool's specific use cases, strengths, or limitations compared to others.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/RamboRogers/fal-image-video-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server