Gemini MCP Server for Claude Desktop

gemini-advanced-image

Generate images with advanced AI capabilities: blend multiple images, maintain character consistency, apply precise edits, or follow templates using Google's Gemini models.

Instructions

Generate advanced images with Gemini 2.5 Flash Image: multi-image fusion, character consistency, targeted editing, and template adherence

Input Schema

TableJSON Schema

Name	Required	Description
`prompt`	Yes	Text description of the desired image or editing instruction
`mode`	No	Generation mode: fusion (blend multiple images), consistency (maintain character/style), targeted_edit (precise edits), template (follow layout), standard (basic generation)
`reference_images`	No	Optional array of file paths to reference images for fusion, consistency, or template modes
`context`	No	Optional context for intelligent enhancement (e.g., "fusion", "consistency", "artistic")

Implementation Reference

src/tools/advanced-image.js:63-284 (handler)

The `execute` method implements the core logic of the `gemini-advanced-image` tool. It validates inputs, processes reference images, enhances prompts using intelligence system, generates images via OpenRouter or Gemini API in various modes (fusion, consistency, etc.), saves the output, and returns formatted response.

async execute(args) {
  const prompt = validateNonEmptyString(args.prompt, 'prompt');
  const mode = args.mode || 'standard';
  const referenceImagePaths = args.reference_images || [];
  const context = args.context ? validateString(args.context, 'context') : null;

  log(`Generating advanced image with mode: "${mode}", prompt: "${prompt}", context: ${context || 'general'}`, this.name);

  try {
    // Validate reference images if provided
    const referenceImages = [];
    if (referenceImagePaths.length > 0) {
      log(`Processing ${referenceImagePaths.length} reference images for fusion mode`, this.name);
      
      for (let i = 0; i < referenceImagePaths.length; i++) {
        const imagePath = referenceImagePaths[i];
        log(`Processing reference image ${i + 1}/${referenceImagePaths.length}: ${imagePath}`, this.name);
        
        try {
          // Check if file path is absolute
          if (!path.isAbsolute(imagePath)) {
            throw new Error(`File path must be absolute, got relative path: ${imagePath}`);
          }
          
          // Check file existence before attempting operations
          if (!fs.existsSync(imagePath)) {
            throw new Error(`Reference image file not found: ${imagePath}`);
          }
          
          // Validate file size
          log(`Validating file size for: ${imagePath}`, this.name);
          validateFileSize(imagePath, config.MAX_IMAGE_SIZE_MB);
          
          // Read file as buffer
          log(`Reading file as buffer: ${imagePath}`, this.name);
          const imageBuffer = readFileAsBuffer(imagePath);
          
          // Get MIME type
          log(`Determining MIME type for: ${imagePath}`, this.name);
          const mimeType = getMimeType(imagePath, config.SUPPORTED_IMAGE_MIMES);
          
          referenceImages.push({
            data: imageBuffer.toString('base64'),
            mimeType,
          });
          
          log(`✓ Successfully loaded reference image ${i + 1}: ${imagePath} (${(imageBuffer.length / 1024).toFixed(2)}KB, ${mimeType})`, this.name);
        } catch (fileError) {
          const detailedError = `Failed to process reference image ${i + 1} (${imagePath}): ${fileError.message}`;
          log(detailedError, this.name);
          throw new Error(`Reference Image Error: ${detailedError}`);
        }
      }
      
      log(`✓ Successfully processed all ${referenceImages.length} reference images`, this.name);
    }

    // Validate mode requirements
    if (['fusion', 'consistency', 'template'].includes(mode) && referenceImages.length === 0) {
      throw new Error(`Mode "${mode}" requires at least one reference image`);
    }

    if (mode === 'fusion' && referenceImages.length < 2) {
      throw new Error('Fusion mode requires at least 2 reference images');
    }

    let enhancedPrompt = prompt;
    if (this.intelligenceSystem.initialized) {
      try {
        const contextForEnhancement = context || mode;
        enhancedPrompt = await this.intelligenceSystem.enhancePrompt(prompt, contextForEnhancement, this.name);
        log('Applied Tool Intelligence enhancement', this.name);
      } catch (err) {
        log(`Tool Intelligence enhancement failed: ${err.message}`, this.name);
      }
    }

    // Try OpenRouter first if available, then fall back to Gemini API
    let imageData = null;
    let providerUsed = 'Gemini API';
    let openRouterError = null;

    if (openRouterService.isServiceAvailable()) {
      try {
        log('Attempting image generation with OpenRouter (free tier)', this.name);
        imageData = await openRouterService.generateAdvancedImage(
          'ADVANCED_IMAGE_GENERATION',
          enhancedPrompt,
          referenceImages,
          { mode }
        );
        providerUsed = 'OpenRouter (free)';
        log('Successfully generated image using OpenRouter', this.name);
      } catch (error) {
        openRouterError = error;
        log(`OpenRouter failed: ${error.message}. Falling back to Gemini API`, this.name);
      }
    }

    // Fall back to Gemini API if OpenRouter failed or is unavailable
    if (!imageData) {
      try {
        log('Using Gemini API for image generation', this.name);
        imageData = await this.geminiService.generateAdvancedImage(
          'ADVANCED_IMAGE_GENERATION',
          enhancedPrompt,
          referenceImages,
          { mode }
        );
        providerUsed = 'Gemini API';
      } catch (geminiError) {
        log(`Gemini API also failed: ${geminiError.message}`, this.name);
        
        // Create comprehensive error message
        let errorMessage = 'Advanced image generation failed on all available providers.';
        if (openRouterError) {
          errorMessage += ` OpenRouter: ${openRouterError.message}.`;
        }
        errorMessage += ` Gemini API: ${geminiError.message}`;
        
        throw new Error(errorMessage);
      }
    }

    if (imageData) {
      log('Successfully extracted advanced image data', this.name);

      ensureDirectoryExists(config.OUTPUT_DIR, this.name);

      const timestamp = Date.now();
      const hash = crypto.createHash('md5').update(prompt + mode).digest('hex');
      const providerPrefix = providerUsed.includes('OpenRouter') ? 'openrouter' : 'gemini';
      const imageName = `${providerPrefix}-advanced-${mode}-${hash}-${timestamp}.png`;
      const imagePath = path.join(config.OUTPUT_DIR, imageName);

      fs.writeFileSync(imagePath, Buffer.from(imageData, 'base64'));
      log(`Advanced image saved to: ${imagePath}`, this.name);

      if (this.intelligenceSystem.initialized) {
        try {
          const resultDescription = `Advanced image generated (${mode} mode): ${imagePath}. Reference images: ${referenceImagePaths.length}`;
          await this.intelligenceSystem.learnFromInteraction(
            prompt,
            enhancedPrompt,
            resultDescription,
            context || mode,
            this.name
          );
          log('Tool Intelligence learned from interaction', this.name);
        } catch (err) {
          log(`Tool Intelligence learning failed: ${err.message}`, this.name);
        }
      }

      let finalResponse = `✓ Advanced image successfully generated using ${providerUsed}\n\n**Mode:** ${mode}\n**Prompt:** "${prompt}"\n**Output:** ${imagePath}`;
      
      if (referenceImages.length > 0) {
        finalResponse += `\n**Reference Images:** ${referenceImagePaths.length} image(s)`;
      }

      if (context && this.intelligenceSystem.initialized) {
        finalResponse += `\n\n---\n_Enhancement applied based on context: ${context}_`;
      }

      // Add provider information
      if (providerUsed.includes('OpenRouter')) {
        finalResponse += `\n\n💰 **Cost**: Free (via OpenRouter free tier)`;
      } else {
        finalResponse += `\n\n💰 **Cost**: ~$0.039 (via Gemini API)`;
      }

      // Add mode-specific information
      switch (mode) {
        case 'fusion':
          finalResponse += `\n\n**Fusion Details:** Blended ${referenceImages.length} images into a cohesive result`;
          break;
        case 'consistency':
          finalResponse += `\n\n**Consistency Details:** Maintained character/style from reference images`;
          break;
        case 'targeted_edit':
          finalResponse += `\n\n**Edit Details:** Applied precise, targeted modifications`;
          break;
        case 'template':
          finalResponse += `\n\n**Template Details:** Followed layout and structure from reference`;
          break;
      }

      return {
        content: [
          {
            type: 'text',
            text: finalResponse,
          },
        ],
      };
    }
    
    log('No image data found in advanced image response', this.name);
    return {
      content: [
        {
          type: 'text',
          text: `Could not generate advanced image with mode "${mode}" for prompt: "${prompt}". No image data was returned by Gemini 2.5 Flash Image API.`,
        },
      ],
    };
  } catch (error) {
    log(`Error generating advanced image: ${error.message}`, this.name);
    
    // Categorize errors for better user feedback
    if (error.message.includes('Reference Image Error:')) {
      // File processing error - provide helpful guidance
      throw new Error(`${error.message}\n\nTroubleshooting:\n• Ensure all file paths are absolute (start with /)\n• Verify files exist and are readable\n• Check file formats are supported: ${Object.keys(config.SUPPORTED_IMAGE_MIMES).join(', ')}`);
    } else if (error.message.includes('Mode') && error.message.includes('requires')) {
      // Mode validation error
      throw new Error(`${error.message}\n\nNote: Fusion mode requires at least 2 reference images, consistency/template modes require at least 1.`);
    } else {
      // API or other errors - preserve original message but add context
      throw new Error(`Advanced image generation failed: ${error.message}\n\nIf this is a repeated error, try using standard mode without reference images.`);
    }
  }
}

src/tools/advanced-image.js:23-48 (schema)

JSON schema defining the input parameters for the `gemini-advanced-image` tool, including prompt (required), mode, reference_images, and context.

{
  type: 'object',
  properties: {
    prompt: {
      type: 'string',
      description: 'Text description of the desired image or editing instruction',
    },
    mode: {
      type: 'string',
      enum: ['fusion', 'consistency', 'targeted_edit', 'template', 'standard'],
      description: 'Generation mode: fusion (blend multiple images), consistency (maintain character/style), targeted_edit (precise edits), template (follow layout), standard (basic generation)',
    },
    reference_images: {
      type: 'array',
      items: {
        type: 'string',
      },
      description: 'Optional array of file paths to reference images for fusion, consistency, or template modes',
    },
    context: {
      type: 'string',
      description: 'Optional context for intelligent enhancement (e.g., "fusion", "consistency", "artistic")',
    },
  },
  required: ['prompt'],
},

src/tools/index.js:80-80 (registration)
Registration of the `AdvancedImageTool` instance (named 'gemini-advanced-image') in the central tool registry.
```
registerTool(new AdvancedImageTool(intelligenceSystem, geminiService));
```

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden for behavioral disclosure. It mentions advanced features like fusion and consistency but doesn't explain operational details: whether it requires authentication, has rate limits, what happens with invalid inputs, or the format/quality of outputs. For a complex image generation tool with multiple modes, this leaves significant gaps in understanding how it behaves.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose ('Generate advanced images') and lists key capabilities without unnecessary words. Every phrase earns its place by highlighting distinct features, making it easy to scan and understand quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex tool with 4 parameters, no annotations, and no output schema, the description is incomplete. It doesn't address critical context like output format (e.g., image file, URL), error handling, or usage constraints. The lack of behavioral details and guidelines leaves the agent under-informed about how to effectively invoke this tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, providing good documentation for all parameters. The description adds marginal value by hinting at parameter usage through mode names (e.g., 'multi-image fusion' relates to 'fusion' mode and 'reference_images'), but doesn't explain semantics beyond what the schema already covers. Baseline 3 is appropriate since the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Generate advanced images with Gemini 2.5 Flash Image' followed by specific capabilities like multi-image fusion, character consistency, targeted editing, and template adherence. It distinguishes itself from basic image generation tools by emphasizing 'advanced' features, though it doesn't explicitly differentiate from sibling tools like 'generate_image' or 'gemini-edit-image'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description lists capabilities but provides no guidance on when to use this tool versus alternatives. It doesn't mention when to choose this over 'generate_image' for basic needs or 'gemini-edit-image' for simpler edits, nor does it specify prerequisites like needing reference images for certain modes. Usage is implied through mode descriptions but not explicitly stated.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Garblesnarff/gemini-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server