Nano Banana 2 Polza MCP Server

Generate or edit images (Multi-Model: Flash & Pro)

generate_image

Read-only

Generate or edit images using natural language instructions. Supports conditioning with up to three reference images and file-based editing.

Instructions

Generate new images or edit existing images using natural language instructions.

Supports multiple input modes:

Pure generation: Just provide a prompt to create new images
Multi-image conditioning: Provide up to 3 input images using input_image_path_1/2/3 parameters
File ID editing: Edit previously uploaded images using Files API ID
File path editing: Edit local images by providing single input image path

Automatically detects mode based on parameters or can be explicitly controlled. Input images are read from the local filesystem to avoid massive token usage. Returns both MCP image content blocks and structured JSON with metadata.

Input Schema

TableJSON Schema

Name	Required	Description	Default
`prompt`	Yes	Clear, detailed image prompt. Include subject, composition, action, location, style, and any text to render. Use the aspect_ratio parameter to pin a specific canvas shape when needed.
`n`	No	Requested image count (model may return fewer).
`negative_prompt`	No	Things to avoid (style, objects, text).
`system_instruction`	No	Optional system tone/style guidance.
`input_image_path_1`	No	Path to first input image for composition/conditioning
`input_image_path_2`	No	Path to second input image for composition/conditioning
`input_image_path_3`	No	Path to third input image for composition/conditioning
`file_id`	No	Files API file ID to use as input/edit source (e.g., 'files/abc123'). If provided, this takes precedence over input_image_path_* parameters for the primary input.
`mode`	No	Operation mode: 'generate' for new image creation, 'edit' for modifying existing images. Auto-detected based on input parameters if not specified.	auto
`model_tier`	No	Model tier: 'flash' (legacy, 1024px), 'nb2' (4K at Flash speed, default), 'pro' (max quality, 4K), or 'auto' (smart selection). Default: 'auto' - automatically selects nb2 or pro based on prompt.	auto
`resolution`	No	Output resolution: 'high', '4k', '2k', '1k'. 4K and 2K available with 'nb2' and 'pro' models. Default: '1k'.	1k
`thinking_level`	No	Reasoning depth hint: 'low' (faster), 'high' (better quality). Applied to the 'nb2' model; 'high' also biases auto-selection toward Pro. Default: None (auto).
`enable_grounding`	No	Enable Google Search grounding for factual accuracy (NB2 and Pro models). Useful for real-world subjects. Default: true.
`aspect_ratio`	No	Optional output aspect ratio (e.g., '16:9'). Polza-supported values: auto, 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9.
`output_path`	No	Output path for generated image(s). If a file path with extension (e.g., '/path/image.png'), saves directly to that path. If a directory path (e.g., '/path/to/dir/'), uses default filename in that directory. If None, uses IMAGE_OUTPUT_DIR environment variable or ~/nanobanana-images.
`return_full_image`	No	Return full-resolution images in MCP response instead of thumbnails. Warning: full images can be large (3-7MB each for 4K). Default: uses RETURN_FULL_IMAGE env var, or false if not set.
`force_new_generation`	No	Start a brand-new upstream generation even if the same request is already pending or recently completed. Use only after the user explicitly confirmed they want a rerun.

Tool Definition Quality

A3.8/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true, but description indicates mutation ('generate new images or edit existing images'), creating a contradiction. Beyond that, description discloses input file handling and return format. The contradiction significantly undermines transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is structured with bullet list of modes but somewhat verbose. Key information is front-loaded (first sentence captures core), but some redundancy exists (e.g., 'Supports multiple input modes:' followed by list). Adequate but not terse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex tool with 17 parameters and no output schema, the description covers input modes, auto-detection, return type (MCP image blocks + metadata), and parameter interactions. Lacks details about metadata structure but adequate for selection and invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

All 17 parameters have schema descriptions, so baseline is 3. Description adds value by explaining inter-parameter relationships (e.g., file_id takes precedence over input_image_path_*) and operational context (mode auto-detection).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states verb+resource: 'Generate new images or edit existing images using natural language instructions.' It lists four specific modes and distinguishes from sibling tools (fetch_generation, maintenance, show_output_stats, upload_file) by focusing on creation/editing.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly describes when to use each mode (pure generation, multi-image conditioning, file ID editing, file path editing), including auto-detection logic. No explicit when-not or alternatives, but the context is clear enough for an agent to decide.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

Who's Calling? MCP Hosts Are an Identity Blind Spot (And the Spec Knows It)
By Om-Shree-0709 on July 25, 2026.
mcp
Agent Identity
OAuth 2.1
Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ivanantigravity-lgtm/nanobanana-2-polzaia-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server