Skip to main content
Glama
RamboRogers

FAL Image/Video MCP Server

by RamboRogers

imagen4

Generate images from text descriptions using Google's Imagen 4 model through the FAL Image/Video MCP Server, with customizable sizes and multiple output options.

Instructions

Imagen 4 - Google's latest text-to-image model

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
promptYesText prompt for image generation
image_sizeNolandscape_4_3
num_imagesNo

Implementation Reference

  • Handler function that implements the core logic for the 'imagen4' tool. It processes input arguments, calls the FAL endpoint 'fal-ai/imagen4/preview' via fal.subscribe, handles the image results, downloads/processes images, and returns formatted content.
    private async handleImageGeneration(args: any, model: any) {
      const {
        prompt,
        image_size = 'landscape_4_3',
        num_inference_steps = 25,
        guidance_scale = 3.5,
        num_images = 1,
        negative_prompt,
        safety_tolerance,
        raw,
      } = args;
    
      try {
        // Configure FAL client lazily with query config override
        configureFalClient(this.currentQueryConfig);
        const inputParams: any = { prompt };
        
        // Add common parameters
        if (image_size) inputParams.image_size = image_size;
        if (num_images > 1) inputParams.num_images = num_images;
        
        // Add model-specific parameters based on model capabilities
        if (model.id.includes('flux') || model.id.includes('stable_diffusion')) {
          if (num_inference_steps) inputParams.num_inference_steps = num_inference_steps;
          if (guidance_scale) inputParams.guidance_scale = guidance_scale;
        }
        if ((model.id.includes('stable_diffusion') || model.id === 'ideogram_v3') && negative_prompt) {
          inputParams.negative_prompt = negative_prompt;
        }
        if (model.id.includes('flux_pro') && safety_tolerance) {
          inputParams.safety_tolerance = safety_tolerance;
        }
        if (model.id === 'flux_pro_ultra' && raw !== undefined) {
          inputParams.raw = raw;
        }
    
        const result = await fal.subscribe(model.endpoint, { input: inputParams });
        const imageData = result.data as FalImageResult;
    
        const processedImages = await downloadAndProcessImages(imageData.images, model.id);
    
        return {
          content: [
            {
              type: 'text',
              text: JSON.stringify({
                model: model.name,
                id: model.id,
                endpoint: model.endpoint,
                prompt,
                images: processedImages,
                metadata: inputParams,
                download_path: DOWNLOAD_PATH,
                data_url_settings: {
                  enabled: ENABLE_DATA_URLS,
                  max_size_mb: Math.round(MAX_DATA_URL_SIZE / 1024 / 1024),
                },
                autoopen_settings: {
                  enabled: AUTOOPEN,
                  note: AUTOOPEN ? "Files automatically opened with default application" : "Auto-open disabled"
                },
              }, null, 2),
            },
          ],
        };
      } catch (error) {
        throw new Error(`${model.name} generation failed: ${error}`);
      }
    }
  • src/index.ts:101-101 (registration)
    Registration entry for the 'imagen4' tool in MODEL_REGISTRY.imageGeneration array, defining its ID, endpoint, name, and description.
    { id: 'imagen4', endpoint: 'fal-ai/imagen4/preview', name: 'Imagen 4', description: 'Google\'s latest text-to-image model' },
  • Dynamic input schema definition for imageGeneration tools like 'imagen4', including prompt, image_size, num_images, and model-specific parameters.
    if (category === 'imageGeneration') {
      baseSchema.inputSchema.properties = {
        prompt: { type: 'string', description: 'Text prompt for image generation' },
        image_size: { type: 'string', enum: ['square_hd', 'square', 'portrait_4_3', 'portrait_16_9', 'landscape_4_3', 'landscape_16_9'], default: 'landscape_4_3' },
        num_images: { type: 'number', default: 1, minimum: 1, maximum: 4 },
      };
      baseSchema.inputSchema.required = ['prompt'];
      
      // Add model-specific parameters
      if (model.id.includes('flux') || model.id.includes('stable_diffusion')) {
        baseSchema.inputSchema.properties.num_inference_steps = { type: 'number', default: 25, minimum: 1, maximum: 50 };
        baseSchema.inputSchema.properties.guidance_scale = { type: 'number', default: 3.5, minimum: 1, maximum: 20 };
      }
      if (model.id.includes('stable_diffusion') || model.id === 'ideogram_v3') {
        baseSchema.inputSchema.properties.negative_prompt = { type: 'string', description: 'Negative prompt' };
      }
    } else if (category === 'textToVideo') {
  • Helper function used to look up the model configuration (endpoint, etc.) by tool name 'imagen4' during tool call dispatch.
    function getModelById(id: string) {
      const allModels = getAllModels();
      return allModels.find(model => model.id === id);
    }
  • src/index.ts:400-402 (registration)
    Registration of 'imagen4' tool schema in the listTools response by iterating over imageGeneration models.
    for (const model of MODEL_REGISTRY.imageGeneration) {
      tools.push(this.generateToolSchema(model, 'imageGeneration'));
    }
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions 'latest text-to-image model' but doesn't cover key traits like rate limits, authentication needs, output format, or potential costs. This is inadequate for a tool with mutation-like behavior (image generation).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise (one short phrase) and front-loaded with the core information. However, it's arguably too brief, bordering on under-specified rather than efficiently informative, which slightly reduces its effectiveness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of an image generation tool with 3 parameters, low schema coverage (33%), no annotations, and no output schema, the description is incomplete. It lacks details on behavior, parameters, and output, making it insufficient for an agent to use the tool effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is low at 33% (only the 'prompt' parameter has a description). The description adds no information about parameters beyond what's implied by 'text-to-image model' for 'prompt'. It doesn't explain 'image_size' enum values or 'num_images' constraints, failing to compensate for the schema gaps.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states this is 'Google's latest text-to-image model', which identifies the resource (text-to-image model) and implies the verb (generate). However, it doesn't specify the exact action (e.g., 'generate images from text prompts') or differentiate from sibling text-to-image tools like stable_diffusion_35 or hunyuan_image, making it somewhat vague.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives. The description doesn't mention any specific contexts, prerequisites, or exclusions, and it doesn't reference sibling tools for comparison, leaving the agent with no usage direction.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/RamboRogers/fal-image-video-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server