Skip to main content
Glama
RamboRogers

FAL Image/Video MCP Server

by RamboRogers

magi

Generate videos from text prompts using FAL AI models, with customizable duration and aspect ratio options.

Instructions

Magi - Creative video generation

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
promptYesText prompt for video generation
durationNo
aspect_ratioNo16:9

Implementation Reference

  • The core handler function that implements the execution logic for the 'magi' tool (text-to-video category). It calls the FAL API endpoint 'fal-ai/magi', processes the video output, handles downloads, data URLs, and auto-opening.
    private async handleTextToVideo(args: any, model: any) {
      const { prompt, duration = 5, aspect_ratio = '16:9' } = args;
    
      try {
        // Configure FAL client lazily with query config override
        configureFalClient(this.currentQueryConfig);
        const inputParams: any = { prompt };
        
        if (duration) inputParams.duration = duration;
        if (aspect_ratio) inputParams.aspect_ratio = aspect_ratio;
    
        const result = await fal.subscribe(model.endpoint, { input: inputParams });
        const videoData = result.data as FalVideoResult;
        const videoProcessed = await downloadAndProcessVideo(videoData.video.url, model.id);
    
        return {
          content: [
            {
              type: 'text',
              text: JSON.stringify({
                model: model.name,
                id: model.id,
                endpoint: model.endpoint,
                prompt,
                video: {
                  url: videoData.video.url,
                  localPath: videoProcessed.localPath,
                  ...(videoProcessed.dataUrl && { dataUrl: videoProcessed.dataUrl }),
                  width: videoData.video.width,
                  height: videoData.video.height,
                },
                metadata: inputParams,
                download_path: DOWNLOAD_PATH,
                data_url_settings: {
                  enabled: ENABLE_DATA_URLS,
                  max_size_mb: Math.round(MAX_DATA_URL_SIZE / 1024 / 1024),
                },
                autoopen_settings: {
                  enabled: AUTOOPEN,
                  note: AUTOOPEN ? "Files automatically opened with default application" : "Auto-open disabled"
                },
              }, null, 2),
            },
          ],
        };
      } catch (error) {
        throw new Error(`${model.name} generation failed: ${error}`);
      }
    }
  • Input schema definition generated for 'magi' and other text-to-video tools, specifying parameters like prompt, duration, and aspect_ratio.
    } else if (category === 'textToVideo') {
      baseSchema.inputSchema.properties = {
        prompt: { type: 'string', description: 'Text prompt for video generation' },
        duration: { type: 'number', default: 5, minimum: 1, maximum: 30 },
        aspect_ratio: { type: 'string', enum: ['16:9', '9:16', '1:1', '4:3', '3:4'], default: '16:9' },
      };
      baseSchema.inputSchema.required = ['prompt'];
  • src/index.ts:110-118 (registration)
    Model registry defining the 'magi' tool with its endpoint, name, and description in the textToVideo category.
    textToVideo: [
      { id: 'veo3', endpoint: 'fal-ai/veo3', name: 'Veo 3', description: 'Google DeepMind\'s latest with speech and audio' },
      { id: 'kling_master_text', endpoint: 'fal-ai/kling-video/v2.1/master/text-to-video', name: 'Kling 2.1 Master', description: 'Premium text-to-video with motion fluidity' },
      { id: 'pixverse_text', endpoint: 'fal-ai/pixverse/v4.5/text-to-video', name: 'Pixverse V4.5', description: 'Advanced text-to-video generation' },
      { id: 'magi', endpoint: 'fal-ai/magi', name: 'Magi', description: 'Creative video generation' },
      { id: 'luma_ray2', endpoint: 'fal-ai/luma-dream-machine/ray-2', name: 'Luma Ray 2', description: 'Latest Luma Dream Machine' },
      { id: 'wan_pro_text', endpoint: 'fal-ai/wan-pro/text-to-video', name: 'Wan Pro', description: 'Professional video effects' },
      { id: 'vidu_text', endpoint: 'fal-ai/vidu/q1/text-to-video', name: 'Vidu Q1', description: 'High-quality text-to-video' }
    ],
  • src/index.ts:402-404 (registration)
    Dynamic tool registration in the ListTools response, where the schema for 'magi' is generated and added to the available tools list.
    }
    for (const model of MODEL_REGISTRY.textToVideo) {
      tools.push(this.generateToolSchema(model, 'textToVideo'));
  • Dispatch logic in the CallTool handler that routes 'magi' calls to the textToVideo handler based on model ID.
    if (MODEL_REGISTRY.imageGeneration.find(m => m.id === name)) {
      return await this.handleImageGeneration(args, model);
    } else if (MODEL_REGISTRY.textToVideo.find(m => m.id === name)) {
      return await this.handleTextToVideo(args, model);
    } else if (MODEL_REGISTRY.imageToVideo.find(m => m.id === name)) {
      return await this.handleImageToVideo(args, model);
    }
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It mentions 'creative video generation' but fails to disclose critical behavioral traits such as whether this is a read-only or mutating operation, expected processing time, rate limits, authentication needs, or output format. For a video generation tool with no annotations, this leaves significant gaps in understanding how it behaves.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise with just three words, which is efficient and front-loaded. However, it's arguably too brief, bordering on under-specified rather than optimally concise, as it lacks necessary details for a tool with 3 parameters and no annotations.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of video generation, 3 parameters with low schema coverage (33%), no annotations, and no output schema, the description is incomplete. It doesn't address key aspects like output format, error handling, or how it differs from siblings, making it inadequate for an agent to use effectively without additional context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 33% (only the 'prompt' parameter has a description), so the description must compensate but adds no parameter information. It doesn't explain what 'duration' or 'aspect_ratio' mean in context, though the schema provides constraints (e.g., duration range, enum values). With 0 parameters mentioned in the description and low schema coverage, it meets the baseline but doesn't enhance understanding beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description 'Creative video generation' states the general purpose but is vague about the specific action. It mentions 'video generation' which distinguishes it from image-focused siblings like hunyuan_image or stable_diffusion_35, but lacks a clear verb (e.g., 'generate videos from text prompts') and doesn't specify the resource or scope beyond 'creative'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives. With many sibling tools for image and video generation (e.g., ltx_video, veo3, kling_master_text), the description offers no context about differences in capabilities, quality, or use cases, leaving the agent to guess based on names alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/RamboRogers/fal-image-video-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server