Skip to main content
Glama
zhongweili
by zhongweili

Generate or edit images (Multi-Model: Flash & Pro)

generate_image
Read-only

Generate or edit images using natural language descriptions. Supports creation from scratch, multi-image conditioning, and editing via file IDs or file paths. Controls aspect ratio, resolution, and model tier for optimized results.

Instructions

Generate new images or edit existing images using natural language instructions.

Supports multiple input modes:

  1. Pure generation: Just provide a prompt to create new images

  2. Multi-image conditioning: Provide up to 3 input images using input_image_path_1/2/3 parameters

  3. File ID editing: Edit previously uploaded images using Files API ID

  4. File path editing: Edit local images by providing single input image path

Automatically detects mode based on parameters or can be explicitly controlled. Input images are read from the local filesystem to avoid massive token usage. Returns both MCP image content blocks and structured JSON with metadata.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
promptYesClear, detailed image prompt. Include subject, composition, action, location, style, and any text to render. Use the aspect_ratio parameter to pin a specific canvas shape when needed.
nNoRequested image count (model may return fewer).
negative_promptNoThings to avoid (style, objects, text).
system_instructionNoOptional system tone/style guidance.
input_image_path_1NoPath to first input image for composition/conditioning
input_image_path_2NoPath to second input image for composition/conditioning
input_image_path_3NoPath to third input image for composition/conditioning
file_idNoFiles API file ID to use as input/edit source (e.g., 'files/abc123'). If provided, this takes precedence over input_image_path_* parameters for the primary input.
modeNoOperation mode: 'generate' for new image creation, 'edit' for modifying existing images. Auto-detected based on input parameters if not specified.auto
model_tierNoModel tier: 'flash' (legacy, 1024px), 'nb2' (4K at Flash speed, default), 'pro' (max quality, 4K), or 'auto' (smart selection). Default: 'auto' - automatically selects nb2 or pro based on prompt.auto
resolutionNoOutput resolution: 'high', '4k', '2k', '1k'. 4K and 2K available with 'nb2' and 'pro' models. Default: 'high'.high
thinking_levelNoReasoning depth hint: 'low' (faster), 'high' (better quality). Applied to the 'nb2' model; 'high' also biases auto-selection toward Pro. Default: None (auto).
enable_groundingNoEnable Google Search grounding for factual accuracy (NB2 and Pro models). Useful for real-world subjects. Default: true.
aspect_ratioNoOptional output aspect ratio (e.g., '16:9'). Standard: 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9. Extreme (nb2 only): 4:1, 1:4, 8:1, 1:8.
output_pathNoOutput path for generated image(s). If a file path with extension (e.g., '/path/image.png'), saves directly to that path. If a directory path (e.g., '/path/to/dir/'), uses default filename in that directory. If None, uses IMAGE_OUTPUT_DIR environment variable or ~/nanobanana-images.
return_full_imageNoReturn full-resolution images in MCP response instead of thumbnails. Warning: full images can be large (3-7MB each for 4K). Default: uses RETURN_FULL_IMAGE env var, or false if not set.
Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description describes write operations ('generate new images or edit existing images'), but annotations set readOnlyHint to true, which indicates the tool should not modify data. This is a direct contradiction. While the description adds useful behavioral details (auto-detection, local filesystem reading, return format), the contradiction severely undermines transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively long but well-structured with bullet points and clear numbered modes. It front-loads the core purpose. Some redundancy exists (e.g., mode auto-detection mentioned twice), but overall it is efficiently organized.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the high complexity (16 params, multiple modes) and no output schema, the description covers input modes and general behavior but lacks details on output structure, error handling, or performance implications. It mentions returning 'MCP image content blocks and structured JSON' without specifying the JSON format. Completeness is adequate but not thorough.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description adds context about input modes and mode auto-detection but does not significantly enhance understanding of individual parameters beyond what the schema provides. The description's added value is moderate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Generate new images or edit existing images using natural language instructions' and lists four specific input modes. The title reinforces the purpose with 'Multi-Model: Flash & Pro'. It is distinct from sibling tools (maintenance, show_output_stats, upload_file).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly lists four input modes and explains automatic mode detection based on parameters. It also details the 'mode' parameter with 'generate' and 'edit' options. However, it does not explicitly state when to avoid using this tool or compare it to alternatives; but siblings are unrelated, so context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/zhongweili/nanobanana-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server