Skip to main content
Glama

Generate Image

generate_image

Generate images from text prompts or edit existing images using Gemini models. Supports multi-turn editing, custom aspect ratios, and resolutions up to 4K.

Instructions

Generate or edit images using Google Gemini. Provide just a prompt for text-to-image generation. Add image file paths to edit or use reference images (up to 14 on gemini-3-pro). Returns the saved file path, model used, token counts, and estimated cost.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
promptYesText description of the image to generate, or editing instruction when images are provided
imagesNoFile paths to input/reference images for editing (max 14). Omit for text-to-image generation
modelNoGemini model ID. Defaults to gemini-2.5-flash-image. Options: gemini-2.5-flash-image, gemini-3-pro-image-preview, gemini-3.1-flash-image-preview
aspectRatioNoImage aspect ratio. Defaults to config value or 1:1
resolutionNoImage resolution. Defaults to config value or 1K. 2K/4K only on gemini-3-pro and gemini-3.1-flash. gemini-2.5-flash is 1K only.
outputDirNoDirectory to save the image. Defaults to config file outputDir, OUTPUT_DIR env var, or ~/gemini-images
filenameNoBase name for the saved file (e.g. 'hero-banner'). Extension added automatically. Duplicates get a version suffix (hero-banner-v2). Omit for auto-generated name.
subfolderNoSubfolder within the output directory (e.g. 'landing-page'). Created automatically.
sessionIdNoContinue a multi-turn editing session. Pass the sessionId from a previous response to refine the image iteratively. The server preserves conversation history.
seedNoSeed for reproducible generation. Same seed + prompt + model = same image.
useSearchGroundingNoEnable Google Search grounding for real-world accuracy. Available on gemini-3.1-flash-image-preview.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It discloses that the tool generates or edits images (creating files), saves them to disk, and returns file path, model, token counts, and cost. It also mentions the non-destructive naming behavior for duplicates. Missing details include whether editing overwrites original files or creates new ones, but overall it covers key behavioral traits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (two sentences) and front-loaded with the core purpose. It efficiently conveys key information without extraneous details. Every sentence serves a purpose, making it easy for an AI to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 11 parameters and absence of output schema, the description covers the main workflow, return values, and model-specific constraints. It addresses multi-turn editing via sessionId and grounding options. However, it lacks information on error handling, rate limits, authentication, or when to prefer the sibling tool. The description is mostly complete for typical use cases.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the baseline is 3. The description adds value beyond the schema by clarifying that the 'images' parameter is for editing/reference and that model-specific limits apply (e.g., max 14 images on gemini-3-pro, resolution constraints). It also explains the function of 'sessionId' for multi-turn editing. This additional context justifies a score above baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states 'Generate or edit images using Google Gemini' and distinguishes between text-to-image generation and editing with reference images. It specifies the return values (file path, model, tokens, cost), making the tool's purpose very clear. The sibling tool 'process_image' likely handles different image operations, further clarifying scope.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear usage guidance: 'Provide just a prompt for text-to-image generation. Add image file paths to edit or use reference images (up to 14 on gemini-3-pro).' It also mentions the return values, helping the agent understand when to use this tool. However, it does not explicitly contrast with 'process_image' or state when not to use this tool, leaving some ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/JimothySnicket/gemini-image-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server