gemini-image-mcp

generate_image

Create custom images from text prompts using AI, with options to save in various formats, resize, and apply compression settings for optimal file size.

Instructions

Generates an image based on a prompt and saves it to the specified path.

Input Schema

TableJSON Schema

Name	Required	Description	Default
`file_name`	No	The name of the image file to be saved (without extension). Defaults to 'generated_image'.	generated_image
`force_conversion_type`	No	Optionally force conversion to a specific format ('jpeg', 'webp', 'png'). If not specified, the original format will be processed, defaulting to PNG for non-JPEG images.
`input_image_paths`	No	Optional. A list of file paths for input images to be used as a reference for generation.
`jpeg_quality`	No	JPEG quality (0-100). Lower values result in higher compression. Defaults to 80.
`optipng_optimization_level`	No	OptiPNG optimization level (0-7). Higher values result in higher compression. Defaults to 2.
`output_directory`	No	The directory path to save the image. Defaults to 'output/images'.	output/images
`png_compression_level`	No	PNG compression level (0-9). Higher values result in higher compression. Defaults to 9.
`prompt`	Yes	Text prompt for image generation. If input images are provided, include instructions on how to use them to create the new image. English is recommended.
`skip_compression_and_resizing`	No	Whether to skip compression and resizing of the generated image. If true, `force_conversion_type` and `target_image_max_size` are ignored. Defaults to false.
`target_image_max_size`	No	The maximum length (in pixels) of the longest side of the resized image. The original aspect ratio is maintained. Defaults to 512.
`use_enhanced_prompt`	No	Whether to use an enhanced prompt to assist the AI's instructions. Defaults to true.
`webp_quality`	No	WebP quality (0-100). Lower values result in higher compression. Defaults to 80.

Implementation Reference

src/tools/imageGenerationTool.ts:197-383 (handler)

The generateImageTool object definition, where the 'execute' function implements the core handler logic: loads input images if provided, constructs prompt, generates image via Gemini API, processes/compresses/resizes the output, and saves to file.

export const generateImageTool = {
  name: 'generate_image',
  description: 'Generates an image based on a prompt and saves it to the specified path.',
  input_schema: generateImageInputSchema,
  execute: async (args: z.infer<typeof generateImageInputSchema>) => { // skip_compression_and_resizing を追加
    try {
      const {
        prompt,
        output_directory,
        file_name,
        input_image_paths,
        use_enhanced_prompt,
        target_image_max_size,
        force_conversion_type,
        webp_quality,
        jpeg_quality,
        png_compression_level,
        optipng_optimization_level
      } = args;

      let imageParts: InlineDataPart[] = [];
      if (input_image_paths && input_image_paths.length > 0) {
        console.log(`Input image paths received: ${input_image_paths.join(', ')}`);
        imageParts = await Promise.all(
          input_image_paths.map(async (filePath) => {
            try {
              console.log(`Processing image file: ${filePath}`);
              const fileBuffer = await readFile(filePath);
              const base64Data = fileBuffer.toString('base64');
              const extension = path.extname(filePath).toLowerCase().substring(1);

              const mimeTypeMap: { [key: string]: string } = {
                'png': MIME_TYPES.PNG,
                'jpg': MIME_TYPES.JPEG,
                'jpeg': MIME_TYPES.JPEG,
                'webp': MIME_TYPES.WEBP,
              };

              const resolvedMimeType = mimeTypeMap[extension] || MIME_TYPES.OCTET_STREAM;
              
              console.log(`File ${filePath} processed. MimeType: ${resolvedMimeType}`);
              return {
                inlineData: {
                  data: base64Data,
                  mimeType: resolvedMimeType,
                },
              };
            } catch (error) {
              console.error(`Error processing image file ${filePath}:`, error);
              throw new Error(`Failed to read or process input image file: ${filePath}. ${error instanceof Error ? error.message : String(error)}`);
            }
          })
        );
        console.log(`Successfully prepared ${imageParts.length} image parts.`);
      }

      // ディレクトリが存在しない場合は作成
      await mkdir(output_directory, { recursive: true });

      let processedPrompt: string;
      if (use_enhanced_prompt) {
        let selectedPromptTemplate: string;
        if (imageParts.length > 0) {
          selectedPromptTemplate = ASSISTANT_PROMPT_TEMPLATE_WITH_IMAGES;
        } else {
          selectedPromptTemplate = ASSISTANT_PROMPT_TEMPLATE_NO_IMAGES;
        }
        processedPrompt = selectedPromptTemplate.replace('{{USER_PROMPT}}', prompt);
      } else {
        processedPrompt = prompt;
      }

      const textPart: Part = { text: processedPrompt };
      let allParts: Part[];

      if (imageParts.length > 0) {
        // 画像がある場合は、テキストプロンプトの後に画像パーツを配置
        // The current selection `tPart, ...im` is part of this line.
        allParts = [textPart, ...imageParts];
      } else {
        allParts = [textPart];
      }

      const response: GenerateContentResponse = await ai.models.generateContent({
        model: IMAGE_GENERATION_MODEL,
        contents: [{ role: "user", parts: allParts }],
        config: {
          // temperature: 0.8, // 必要に応じて調整
          responseModalities: [Modality.IMAGE, Modality.TEXT], // 画像とテキストのレスポンスを期待
          // numberOfImages: 1, // generateContent APIでは直接この指定がない場合があるため、モデルのデフォルトや挙動に依存。必要ならAPI仕様確認。
        } as GenerationConfig // 明示的な型アサーション（必要に応じて）
      });

      if (response.candidates && response.candidates[0]?.content?.parts) {
        let imageData: string | undefined
        let imageMimeType: string | undefined
        for (const part of response.candidates[0].content.parts) {
          if (part.inlineData && part.inlineData.mimeType?.startsWith('image/') && part.inlineData.data) {
            imageData = part.inlineData.data;
            imageMimeType = part.inlineData.mimeType
            break; // 最初の画像部分が見つかったらループを抜ける
          }
        }

        if (imageData && imageMimeType) { // imageMimeTypeもチェック対象に加える
          const imageBuffer = Buffer.from(imageData, 'base64');
          const meta = await sharp(imageBuffer).metadata();
          
          const originalSizeKB = (imageBuffer.length / 1024).toFixed(2);

          let finalImageBuffer: Buffer;
          let finalExtension: typeof EXTENSIONS.JPG | typeof EXTENSIONS.PNG | typeof EXTENSIONS.WEBP;
          let processedSizeKB: string;
          let baseMessage: string; // Pathを含まないメッセージ部分

          if (args.skip_compression_and_resizing) {
            // 圧縮とリサイズをスキップする場合
            if (imageMimeType === MIME_TYPES.PNG) {
              finalExtension = EXTENSIONS.PNG;
            } else if (imageMimeType === MIME_TYPES.JPEG) {
              finalExtension = EXTENSIONS.JPG;
            } else if (imageMimeType === MIME_TYPES.WEBP) {
              finalExtension = EXTENSIONS.WEBP;
            } else {
              // 予期しないMIMEタイプの場合のフォールバック
              console.warn(`Unexpected image MIME type received: ${imageMimeType}. Defaulting to JPG extension.`);
              finalExtension = EXTENSIONS.JPG;
            }
            finalImageBuffer = imageBuffer; // 元のバッファをそのまま使用
            processedSizeKB = originalSizeKB; // サイズは変わらない
            baseMessage = `generated (uncompressed)`;
          } else {
            // 既存の圧縮・リサイズ処理
            const compressionOptions: CompressionOptions = {
              jpegQuality: jpeg_quality,
              webpQuality: webp_quality,
              pngCompressionLevel: png_compression_level,
              optipngOptimizationLevel: optipng_optimization_level,
            };
            const { processedBuffer, extension } = await processAndCompressImage(imageBuffer, meta.format, force_conversion_type, target_image_max_size, compressionOptions);
            finalImageBuffer = processedBuffer;
            finalExtension = extension;
            processedSizeKB = (finalImageBuffer.length / 1024).toFixed(2);

            baseMessage = `generated and compressed`;
            const originalFormat = meta.format === 'jpeg' ? 'jpeg' : 'png'; // Geminiが返すフォーマットを正規化
            // 元のフォーマットと強制変換のフォーマットが異なる場合のみメッセージを変更
            if (force_conversion_type && force_conversion_type !== originalFormat) {
              baseMessage = `converted to ${extension.toUpperCase()} and compressed`;
            }
          }
          const outputPath = await getUniqueFilePath(output_directory, file_name, finalExtension);
          await writeFile(outputPath, finalImageBuffer);
          return {
            content: [
              {
                type: "text",
                text: `Image successfully ${baseMessage} at ${outputPath}.\nOriginal size: ${originalSizeKB}KB, Final size: ${processedSizeKB}KB`
              }
            ]
          };
        } else {
          const detail = response.candidates[0]?.content?.parts?.map(p => p.text || p.inlineData?.mimeType || 'unknown_part').join(', ');
          throw new Error(`No valid image data found in the response. Received parts: [${detail || 'none'}]`);
        }
      } else {
        let errorMessage = 'Image generation failed. The response may be empty or in an unexpected format.';
        if (response.promptFeedback) { // promptFeedback is at the top level of the response
          errorMessage += `\n[Feedback] ${JSON.stringify(response.promptFeedback, null, 2)}`;
        }
        const candidate = response.candidates?.[0];
        if (candidate?.finishReason) {
          errorMessage += `\n[Finish Reason] ${candidate.finishReason}`;
        }
        if (candidate?.safetyRatings) {
          errorMessage += `\n[Safety Ratings] ${JSON.stringify(candidate.safetyRatings, null, 2)}`;
        }
        throw new Error(errorMessage);
      }
    } catch (error: any) {
      console.error('画像生成エラー:', error);
      return {
        content: [{ type: "text", text: `An error occurred during image generation: ${error.message}` }]
      };
    }
  }
};

src/tools/imageGenerationTool.ts:87-100 (schema)

Zod input schema defining parameters for the generate_image tool, including prompt, file paths, compression options, etc.

export const generateImageInputSchema = z.object({
  prompt: z.string().describe('Text prompt for image generation. If input images are provided, include instructions on how to use them to create the new image. English is recommended.'),
  output_directory: z.string().default(DEFAULT_OUTPUT_DIRECTORY).describe(`The directory path to save the image. Defaults to '${DEFAULT_OUTPUT_DIRECTORY}'.`),
  file_name: z.string().default(DEFAULT_FILE_NAME).describe(`The name of the image file to be saved (without extension). Defaults to '${DEFAULT_FILE_NAME}'.`),
  input_image_paths: z.array(z.string().describe("Absolute path of the image file.")).optional().describe("Optional. A list of file paths for input images to be used as a reference for generation."),
  use_enhanced_prompt: z.boolean().default(true).describe("Whether to use an enhanced prompt to assist the AI's instructions. Defaults to true."),
  target_image_max_size: z.number().int().positive().optional().default(DEFAULT_TARGET_IMAGE_MAX_SIZE).describe(`The maximum length (in pixels) of the longest side of the resized image. The original aspect ratio is maintained. Defaults to ${DEFAULT_TARGET_IMAGE_MAX_SIZE}.`),
  force_conversion_type: z.enum(['jpeg', 'webp', 'png']).optional().describe("Optionally force conversion to a specific format ('jpeg', 'webp', 'png'). If not specified, the original format will be processed, defaulting to PNG for non-JPEG images."),
  skip_compression_and_resizing: z.boolean().optional().default(false).describe("Whether to skip compression and resizing of the generated image. If true, `force_conversion_type` and `target_image_max_size` are ignored. Defaults to false."),
  jpeg_quality: z.number().int().min(0).max(100).optional().default(DEFAULT_JPEG_QUALITY).describe(`JPEG quality (0-100). Lower values result in higher compression. Defaults to ${DEFAULT_JPEG_QUALITY}.`),
  webp_quality: z.number().int().min(0).max(100).optional().default(DEFAULT_WEBP_QUALITY).describe(`WebP quality (0-100). Lower values result in higher compression. Defaults to ${DEFAULT_WEBP_QUALITY}.`),
  png_compression_level: z.number().int().min(0).max(9).optional().default(DEFAULT_PNG_COMPRESSION_LEVEL).describe(`PNG compression level (0-9). Higher values result in higher compression. Defaults to ${DEFAULT_PNG_COMPRESSION_LEVEL}.`),
  optipng_optimization_level: z.number().int().min(0).max(7).optional().default(DEFAULT_OPTIPNG_OPTIMIZATION_LEVEL).describe(`OptiPNG optimization level (0-7). Higher values result in higher compression. Defaults to ${DEFAULT_OPTIPNG_OPTIMIZATION_LEVEL}.`),
});

src/index.ts:18-31 (registration)

Registration of the generate_image tool on the MCP server, wrapping the execute function to format the response correctly.

server.tool(generateImageTool.name, generateImageTool.description, generateImageTool.input_schema.shape, async (args) => {
  // ツール実行時に渡される引数オブジェクト(args)をそのままexecute関数に渡すことで、
  // 今後ツールに新しい引数を追加した際にこのファイルを変更する必要がなくなります。
  const res = await generateImageTool.execute(args);

  // executeからの戻り値がエラーメッセージの場合も考慮し、安全にtextプロパティにアクセスします。
  if (res && res.content && res.content.length > 0 && res.content[0].text) {
    return {
      content: [{ type: "text", text: res.content[0].text }]
    };
  }
  // Fallback for unexpected responses
  return { content: [{ type: "text", text: "Processing completed, but an unexpected response was received." }] };
});

src/tools/imageGenerationTool.ts:134-195 (helper)

Helper function to process, resize, convert format, and compress the generated image using sharp and imagemin plugins.

async function processAndCompressImage(
  imageBuffer: Buffer,
  originalFormat: string | undefined, // sharpから取得したフォーマット名 (e.g., 'jpeg', 'png')
  conversionType: 'jpeg' | 'webp' | 'png' | undefined,
  targetImageMaxSize: number,
  options: CompressionOptions
): Promise<{ processedBuffer: Buffer; extension: typeof EXTENSIONS.JPG | typeof EXTENSIONS.PNG | typeof EXTENSIONS.WEBP }> {
  const sharpInstance = sharp(imageBuffer).resize(targetImageMaxSize, targetImageMaxSize, { fit: 'inside' });

  // 最終的な処理フォーマットを決定。指定があればそれを使い、なければ元のフォーマットを維持（JPEG以外はPNG扱い）
  const targetFormat = conversionType || (originalFormat === 'jpeg' ? 'jpeg' : 'png');

  if (targetFormat === 'jpeg') {
    const resizedJpegBuffer = await sharpInstance
      .jpeg({
        quality: options.jpegQuality,
        progressive: true,
        mozjpeg: true
      })
      .toBuffer();

    const compressedData = await imagemin.buffer(resizedJpegBuffer, {
      plugins: [
        imageminMozjpeg({
          quality: options.jpegQuality,
          progressive: true
        })
      ]
    });
    const processedBuffer = Buffer.from(compressedData);
    return { processedBuffer, extension: EXTENSIONS.JPG };
  } else if (targetFormat === 'webp') {
    // sharpでリサイズとWebPへの初期変換
    const resizedWebpBuffer = await sharpInstance
      .webp({ quality: options.webpQuality }) // sharpでも品質を指定し、初期変換
      .toBuffer();

    // imagemin-webpでさらに最適化
    const compressedData = await imagemin.buffer(resizedWebpBuffer, {
      plugins: [
        imageminWebp({
          quality: options.webpQuality,
        })
      ]
    });
    const processedBuffer = Buffer.from(compressedData);
    return { processedBuffer, extension: EXTENSIONS.WEBP };
  } else { // 'png'
    // PNGとして処理
    const resizedPngBuffer = await sharpInstance
      .png({ compressionLevel: options.pngCompressionLevel })
      .toBuffer();

    const compressedData = await imagemin.buffer(resizedPngBuffer, {
      plugins: [
        imageminOptipng({ optimizationLevel: options.optipngOptimizationLevel })
      ]
    });
    const processedBuffer = Buffer.from(compressedData);
    return { processedBuffer, extension: EXTENSIONS.PNG };
  }
}

src/tools/imageGenerationTool.ts:103-123 (helper)

Helper function to generate a unique output file path to avoid overwriting existing files.

async function getUniqueFilePath(directory: string, baseName: string, extension: string): Promise<string> {
  let counter = 0;
  let outputPath = '';
  while (true) {
    const currentFileName = counter === 0 ? baseName : `${baseName} (${counter})`;
    outputPath = path.join(directory, `${currentFileName}.${extension}`);
    try {
      await stat(outputPath); // ファイルが存在するかチェック
      counter++; // 存在する場合はカウンターを増やす
    } catch (e: any) {
      if (e.code === 'ENOENT') {
        // ファイルが存在しない場合、このパスを使用
        break;
      } else {
        // その他のエラー
        throw new Error(`Failed to check file path ${outputPath}: ${e instanceof Error ? e.message : String(e)}`);
      }
    }
  }
  return outputPath;
}

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions saving the image but omits critical details like potential side effects (e.g., file overwriting), performance considerations (e.g., generation time, resource usage), error handling, or output specifics. For a complex tool with 12 parameters, this lack of behavioral context is a significant gap.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that directly states the tool's core function without unnecessary words. It is front-loaded with the essential action, making it highly concise and well-structured for quick understanding.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (12 parameters, no output schema, and no annotations), the description is inadequate. It fails to explain behavioral traits, output details, or usage context, leaving the agent with insufficient information to effectively invoke the tool beyond basic parameter filling.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 100%, meaning all parameters are well-documented in the schema itself. The description adds no additional parameter semantics beyond implying a 'prompt' and 'path' are involved, which is already covered. Thus, it meets the baseline of 3 where the schema does the heavy lifting without adding extra value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Generates an image based on a prompt and saves it to the specified path.' It specifies the verb ('generates'), resource ('image'), and destination ('saves it to the specified path'), making the action explicit. However, it doesn't differentiate from siblings since there are none, so it cannot achieve a perfect score of 5.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives, prerequisites, or contextual constraints. It merely states what the tool does without indicating scenarios for its application, leaving the agent without usage direction.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

generate_imageC

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/creating-cat/gemini-image-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server