Skip to main content
Glama
Garblesnarff

Gemini MCP Server for Claude Desktop

generate_image

Create images from text descriptions using Google's Gemini AI model, with optional context for style enhancement.

Instructions

Generate an image using Google's Gemini 2.0 Flash Experimental model (with learned user preferences)

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
promptYesText description of the desired image
contextNoOptional context for intelligent enhancement (e.g., "artistic", "photorealistic", "technical")

Implementation Reference

  • The main handler function (execute) for the generate_image tool. It validates inputs, optionally enhances the prompt using the intelligence system, calls the Gemini service to generate the image, saves the base64 image data to a file, learns from the interaction, and returns a text response with the file path.
    async execute(args) {
      const prompt = validateNonEmptyString(args.prompt, 'prompt');
      const context = args.context ? validateString(args.context, 'context') : null;
      log(`Generating image: "${prompt}" with context: ${context || 'general'}`, this.name);
    
      try {
        let enhancedPrompt = prompt;
        if (this.intelligenceSystem.initialized) {
          try {
            enhancedPrompt = await this.intelligenceSystem.enhancePrompt(prompt, context, this.name);
            log('Applied Tool Intelligence enhancement', this.name);
          } catch (err) {
            log(`Tool Intelligence enhancement failed: ${err.message}`, this.name);
          }
        }
    
        const formattedPrompt = `Create a detailed and high-quality image of: ${enhancedPrompt}`;
        const imageData = await this.geminiService.generateImage('IMAGE_GENERATION', formattedPrompt);
    
        if (imageData) {
          log('Successfully extracted image data', this.name);
    
          ensureDirectoryExists(config.OUTPUT_DIR, this.name);
    
          const timestamp = Date.now();
          const hash = crypto.createHash('md5').update(prompt).digest('hex');
          const imageName = `gemini-${hash}-${timestamp}.png`;
          const imagePath = path.join(config.OUTPUT_DIR, imageName);
    
          fs.writeFileSync(imagePath, Buffer.from(imageData, 'base64'));
          log(`Image saved to: ${imagePath}`, this.name);
    
          if (this.intelligenceSystem.initialized) {
            try {
              await this.intelligenceSystem.learnFromInteraction(prompt, enhancedPrompt, `Image generated successfully: ${imagePath}`, context, this.name);
              log('Tool Intelligence learned from interaction', this.name);
            } catch (err) {
              log(`Tool Intelligence learning failed: ${err.message}`, this.name);
            }
          }
    
          let finalResponse = `✓ Image successfully generated from prompt: "${prompt}"\n\nYou can find the image at: ${imagePath}`; // eslint-disable-line max-len
          if (context && this.intelligenceSystem.initialized) {
            finalResponse += `\n\n---\n_Enhancement applied based on context: ${context}_`;
          }
    
          return {
            content: [
              {
                type: 'text',
                text: finalResponse,
              },
            ],
          };
        }
        log('No image data found in response', this.name);
        return {
          content: [
            {
              type: 'text',
              text: `Could not generate image for: "${prompt}". No image data was returned by Gemini API.`,
            },
          ],
        };
      } catch (error) {
        log(`Error generating image: ${error.message}`, this.name);
        throw new Error(`Error generating image: ${error.message}`);
      }
    }
  • JSON schema defining the input parameters for the generate_image tool: required 'prompt' and optional 'context'.
    {
      type: 'object',
      properties: {
        prompt: {
          type: 'string',
          description: 'Text description of the desired image',
        },
        context: {
          type: 'string',
          description: 'Optional context for intelligent enhancement (e.g., "artistic", "photorealistic", "technical")',
        },
      },
      required: ['prompt'],
    },
  • Registers the ImageGenerationTool instance in the central tool registry using the registerTool function.
    registerTool(new ImageGenerationTool(intelligenceSystem, geminiService));
  • Helper method in GeminiService that performs the actual image generation by calling the Gemini API with the provided model and prompt, and extracts base64 image data from the response.
    async generateImage(modelType, prompt) {
      try {
        const modelConfig = getGeminiModelConfig(modelType);
        // Pass only the model name to getGenerativeModel
        const model = this.genAI.getGenerativeModel({ model: modelConfig.model });
        const content = formatTextPrompt(prompt); // Image generation also uses text prompt
    
        // Pass the generationConfig to the generateContent method
        const result = await model.generateContent({
          contents: [{ parts: content }],
          generationConfig: modelConfig.generationConfig,
        });
        log(`Image generation response received from Gemini API for model type: ${modelType}`, 'gemini-service');
        return extractImageData(result.response?.candidates?.[0]);
      } catch (error) {
        log(`Error generating image with Gemini API for model type ${modelType}: ${error.message}`, 'gemini-service');
        throw new Error(`Gemini image generation failed: ${error.message}`);
      }
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions the model and 'learned user preferences', but fails to detail critical aspects such as rate limits, authentication requirements, output format (e.g., image type, size), or potential costs/limitations. This leaves significant gaps for an AI agent to understand the tool's behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that directly states the tool's purpose without unnecessary details. It is front-loaded with the core action and model specification, making it highly concise and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of image generation (involving models, preferences, and output handling), the description is insufficient. With no annotations and no output schema, it lacks details on behavioral traits, return values, or error handling. This makes it incomplete for effective tool invocation by an AI agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the input schema already documents both parameters ('prompt' and 'context') adequately. The description adds no additional parameter semantics beyond what the schema provides, such as examples or constraints, resulting in the baseline score of 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('generate an image') and specifies the model used ('Google's Gemini 2.0 Flash Experimental model'), which distinguishes it from siblings like 'gemini-edit-image' or 'gemini-analyze-image'. However, it doesn't explicitly contrast with all siblings (e.g., 'gemini-advanced-image'), keeping it from a perfect score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives is provided. The description mentions 'learned user preferences' but doesn't clarify how this affects tool selection or when to choose it over other image-related tools like 'gemini-advanced-image' or 'gemini-edit-image'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Garblesnarff/gemini-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server