Skip to main content
Glama

generate_image

Create images from text descriptions using Google Gemini AI, with options for aspect ratios, visual references, styles, and watermarks to generate custom visual content.

Instructions

Create a new image using Google Gemini AI from a text description, optionally providing reference images to guide the result. Use the edit_image tool when you need to modify an existing asset.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
descriptionYesDetailed description of the image to generate. For better social media results, include details about colors, style and composition.
imagesNoOptional array of image file paths to use as visual context (absolute or relative).
watermarkPositionNoOptional watermark position when using `watermarkPath`.bottom-right
aspectRatioNoAspect ratio preset (square/landscape/portrait).square
styleNoAdditional style for the image (optional). Examples: "minimalist", "colorful", "professional", "artistic"
outputPathNoPath where to save the image (optional). If not specified, saves in current directory. Can be a folder or complete path with filename.
watermarkPathNoPath to watermark image file to overlay in a corner (optional)

Implementation Reference

  • The main execution handler for the 'generate_image' tool. Validates arguments, generates image data using GeminiService, saves the image using ImageService, and returns the file path.
    export async function handleGenerateImage(
      args: GenerateImageArgs,
      geminiService: GeminiService,
      imageService: ImageService
    ) {
      if (!args.description || !args.description.trim()) {
        throw invalidParams('Description is required to generate an image');
      }
    
      try {
        const imageData = await geminiService.generateImage(args);
    
        const filePath = await imageService.saveImage(imageData, {
          outputPath: args.outputPath,
          description: args.description,
          watermarkPath: args.watermarkPath,
          watermarkPosition: args.watermarkPosition
        });
    
        return {
          content: [
            {
              type: 'text',
              text: filePath,
            },
          ],
        };
      } catch (error) {
        throw ensureMcpError(error, ErrorCode.InternalError, 'Failed to generate image', {
          stage: 'generate_image.tool',
        });
      }
    }
  • The input schema definition for the 'generate_image' tool, defining parameters like description, images, aspectRatio, style, etc.
    inputSchema: {
      type: 'object',
      properties: {
        description: {
          type: 'string',
          description: 'Detailed description of the image to generate. For better social media results, include details about colors, style and composition.',
        },
        images: {
          type: 'array',
          items: { type: 'string' },
          description: 'Optional array of image file paths to use as visual context (absolute or relative).',
        },
        watermarkPosition: {
          type: 'string',
          enum: ['top-left', 'top-right', 'bottom-left', 'bottom-right'],
          description: 'Optional watermark position when using `watermarkPath`.',
          default: 'bottom-right',
        },
        aspectRatio: {
          type: 'string',
          enum: ['square', 'landscape', 'portrait'],
          description: 'Aspect ratio preset (square/landscape/portrait).',
          default: 'square',
        },
        style: {
          type: 'string',
          description: 'Additional style for the image (optional). Examples: "minimalist", "colorful", "professional", "artistic"',
        },
        outputPath: {
          type: 'string',
          description: 'Path where to save the image (optional). If not specified, saves in current directory. Can be a folder or complete path with filename.',
        },
        watermarkPath: {
          type: 'string',
          description: 'Path to watermark image file to overlay in a corner (optional)',
        },
      },
      required: ['description'],
    },
  • src/index.ts:58-62 (registration)
    Registration of the 'generate_image' tool in the MCP server's listTools request handler, where the tools array includes generateImageTool.
    this.server.setRequestHandler(ListToolsRequestSchema, async () => {
      return {
        tools: [generateImageTool, editImageTool],
      };
    });
  • src/index.ts:66-69 (registration)
    Dispatch/handling registration in the MCP server's callTool request handler, routing 'generate_image' calls to the handleGenerateImage function.
    if (request.params.name === 'generate_image') {
      const args = request.params.arguments as unknown as GenerateImageArgs;
      return await handleGenerateImage(args, this.geminiService, this.imageService);
    }
  • Core helper function in GeminiService that constructs the prompt, assembles image parts, calls Google Gemini API for image generation, and extracts the base64 image data.
    private async _generateImageInternal(args: GenerateImageArgs, helperPath: string | null): Promise<ImageData> {
        // Build optimized prompt for image generation
        let fullPrompt = `${args.description}`;
    
        // Add style if specified
        if (args.style) {
            fullPrompt += ` The style should be ${args.style}.`;
        }
    
        if (helperPath) {
            fullPrompt += '. Use the white image only as a guide for the aspect ratio.';
        }
    
        const model = this.genAI.getGenerativeModel({
            model: 'gemini-3-pro-image-preview',
            safetySettings: this.getSafetySettings()
        });
    
        // If images are provided as context, attach them as inline parts
        const parts: any[] = [{ text: fullPrompt }];
    
        if (args.images) {
            for (const userImage of args.images) {
                parts.push(await toInlinePart(userImage));
            }
        }
    
        if (helperPath) {
            parts.push(await toInlinePart(helperPath));
        }
    
        let response;
        try {
            response = await model.generateContent(parts);
        } catch (error) {
            throw ensureMcpError(error, ErrorCode.InternalError, 'Gemini image generation request failed', {
                stage: 'GeminiService.generateContent',
            });
        }
    
        // Extract image from response
        const candidate = response.response.candidates?.[0];
        if (!candidate?.content?.parts) {
            const finishReason = candidate?.finishReason ?? 'unknown';
    
            throw internalError(`Gemini finish reason: ${String(finishReason)}`, {
                reason: 'emptyCandidate',
                finishReason,
            });
        }
    
        for (const part of candidate.content.parts) {
            if (part.inlineData?.data && part.inlineData?.mimeType) {
                return {
                    base64: part.inlineData.data,
                    mimeType: part.inlineData.mimeType,
                };
            }
        }
    
        throw internalError('Gemini response did not contain image data', {
            reason: 'missingInlineData',
        });
    }
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. While it mentions the tool creates images and can use reference images, it lacks critical behavioral details like whether this is a read/write operation, potential rate limits, authentication requirements, error handling, or what the output looks like (e.g., file path, image data). For a generative AI tool with no annotation coverage, this is a significant gap.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly concise with only two sentences that each earn their place. The first sentence states the core purpose and key optional feature, while the second provides crucial sibling tool differentiation. There's zero wasted text and it's front-loaded with the most important information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (generative AI with 7 parameters) and lack of both annotations and output schema, the description is incomplete. While it covers purpose and sibling differentiation well, it doesn't address behavioral aspects, output format, or error conditions that would be important for an AI agent to use this tool effectively. The 100% schema coverage helps but doesn't compensate for missing behavioral context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all 7 parameters thoroughly. The description adds minimal value beyond the schema by mentioning 'optionally providing reference images' (implied by the images parameter) and the sibling tool reference. It doesn't provide additional parameter semantics beyond what's already in the structured schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('Create a new image') and resources ('using Google Gemini AI from a text description'), and distinguishes it from its sibling tool ('Use the `edit_image` tool when you need to modify an existing asset'). This provides immediate clarity about what this tool does versus alternatives.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly provides usage guidance by stating when to use this tool ('Create a new image') versus when to use the alternative ('Use the `edit_image` tool when you need to modify an existing asset'). This gives clear context for tool selection without ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/antoniolg/gemini-image-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server