Nano Banana MCP Server (CLIProxyAPI Edition)

Generate or edit images (Multi-Model: Flash & Pro)

generate_image

Read-only

Create new images or modify existing ones using natural language prompts. Supports multiple input modes including generation, multi-image composition, and editing via file paths or IDs.

Instructions

Generate new images or edit existing images using natural language instructions.

Supports multiple input modes:

Pure generation: Just provide a prompt to create new images
Multi-image conditioning: Provide up to 3 input images using input_image_path_1/2/3 parameters
File ID editing: Edit previously uploaded images using Files API ID
File path editing: Edit local images by providing single input image path

Automatically detects mode based on parameters or can be explicitly controlled. Input images are read from the local filesystem to avoid massive token usage. Returns both MCP image content blocks and structured JSON with metadata.

Input Schema

TableJSON Schema

Name	Required	Description	Default
`prompt`	Yes	Clear, detailed image prompt. Include subject, composition, action, location, style, and any text to render. Use the aspect_ratio parameter to pin a specific canvas shape when needed.
`n`	No	Requested image count (model may return fewer).
`negative_prompt`	No	Things to avoid (style, objects, text).
`system_instruction`	No	Optional system tone/style guidance.
`input_image_path_1`	No	Path to first input image for composition/conditioning
`input_image_path_2`	No	Path to second input image for composition/conditioning
`input_image_path_3`	No	Path to third input image for composition/conditioning
`file_id`	No	Files API file ID to use as input/edit source (e.g., 'files/abc123'). If provided, this takes precedence over input_image_path_* parameters for the primary input.
`mode`	No	Operation mode: 'generate' for new image creation, 'edit' for modifying existing images. Auto-detected based on input parameters if not specified.	auto
`model_tier`	No	Model tier: 'flash' (speed, 1024px), 'pro' (quality, up to 4K), or 'auto' (smart selection). Default: 'auto' - automatically selects based on prompt quality/speed indicators.	auto
`resolution`	No	Output resolution: 'high', '4k', '2k', '1k'. 4K and 2K only available with 'pro' model. Default: 'high'.	high
`thinking_level`	No	Reasoning depth for Pro model: 'low' (faster), 'high' (better quality). Only applies to Pro model. Default: 'high'.	high
`enable_grounding`	No	Enable Google Search grounding for factual accuracy (Pro model only). Useful for real-world subjects. Default: true.
`aspect_ratio`	No	Optional output aspect ratio (e.g., '16:9'). See docs for supported values: 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9.

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds valuable behavioral context beyond annotations. While annotations indicate readOnlyHint=true and openWorldHint=true, the description explains practical behaviors: automatic mode detection, local filesystem reading to avoid token usage, and return formats (MCP image content blocks and structured JSON with metadata). It doesn't contradict annotations and provides operational insights not covered by structured fields.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and appropriately sized. It starts with a clear purpose statement, then lists usage modes in a bullet-like format, and ends with operational details. Every sentence adds value, though the final sentence about return formats could be slightly more concise. Overall, it's efficient and front-loaded with essential information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (14 parameters, multiple modes) and rich schema coverage (100%), the description provides good contextual completeness. It explains usage modes, behavioral traits, and output formats, compensating for the lack of an output schema. However, it could briefly mention limitations or error cases to achieve a perfect score, but it's largely sufficient for an agent to use the tool effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the schema already documents all 14 parameters thoroughly. The description adds minimal parameter semantics beyond the schema, mainly by grouping parameters into usage modes (e.g., input_image_path_* for multi-image conditioning). This meets the baseline of 3 since the schema does the heavy lifting, but the description doesn't significantly enhance parameter understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Generate new images or edit existing images using natural language instructions.' It specifies the verb ('generate or edit'), resource ('images'), and distinguishes from siblings like 'upload_file' and 'maintenance' by focusing on image creation/modification rather than file management or system operations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage guidance by detailing four distinct input modes (pure generation, multi-image conditioning, File ID editing, File path editing) and explaining how the tool automatically detects mode based on parameters. It also distinguishes from sibling tools by not overlapping with their functions (e.g., 'upload_file' is for uploading, while this tool uses uploaded or local files).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ion-aluminium/nanobanana-mcp-cliproxyapi'

If you have feedback or need assistance with the MCP directory API, please join our Discord server