nanobanana-mcp-server

fastmcp-skeleton.md•16.3 KiB

Awesome—let’s switch this to a Python FastMCP server and keep everything “nano banana” (Gemini 2.5 Flash Image) friendly. Below you’ll get: 1. a production-ready FastMCP **server skeleton** (Python) you can run today 2. an **English design spec** for tools, resources, and prompts (names, params, returns, behavior) mapped to Gemini’s API and FastMCP conventions I’ve grounded key choices with official docs—citations follow each relevant section. --- # FastMCP server skeleton (Python) > Files you can copy/paste: > > * `server.py` (the MCP server) > * optional `.env` (to hold `GEMINI_API_KEY`) > * `pyproject.toml` (dependencies and project config) ### `.env` ``` GEMINI_API_KEY=YOUR_API_KEY ``` ### `server.py` ```python import os from typing import Annotated, Literal, Optional from dataclasses import dataclass from io import BytesIO from dotenv import load_dotenv from pydantic import Field from PIL import Image as PILImage from fastmcp import FastMCP, Context, Image as MCPImage from fastmcp.tools.tool import ToolResult # --- Gemini SDK (official) --- from google import genai from google.genai import types as gx # Load local .env if present load_dotenv() # ----- Server ----- mcp = FastMCP( name="nano-banana-mcp", instructions=( "This server exposes image generation & editing powered by Gemini 2.5 Flash Image " "(aka 'nano banana'). It returns images as real MCP image content blocks, and also " "provides structured JSON with metadata and reproducibility hints." ), ) # ----- Helpers ----- def _client() -> genai.Client: # google-genai picks up GEMINI_API_KEY or GOOGLE_API_KEY from env # We error early if no key is available. if not (os.getenv("GEMINI_API_KEY") or os.getenv("GOOGLE_API_KEY")): raise RuntimeError("Missing GEMINI_API_KEY (or GOOGLE_API_KEY) in environment.") return genai.Client() def _parts_from_inline_b64(images_b64: list[str], mime_types: list[str]) -> list[gx.Part]: parts: list[gx.Part] = [] for b64, mt in zip(images_b64, mime_types): # google-genai accepts inline bytes via Part.from_bytes(data=..., mime_type=...) parts.append(gx.Part.from_bytes(data=BytesIO(bytes()).getvalue(), mime_type=mt)) # placeholder # The SDK expects raw bytes, not base64. Decode if you pass base64: import base64 raw = base64.b64decode(b64) parts[-1] = gx.Part.from_bytes(data=raw, mime_type=mt) return parts def _extract_image_bytes_list(response) -> list[bytes]: """Extract all image bytes returned in the response.""" out: list[bytes] = [] cand = getattr(response, "candidates", None) if not cand: return out for part in cand[0].content.parts: # parts may contain text or inline_data (for image bytes) if getattr(part, "inline_data", None) and getattr(part.inline_data, "data", None): out.append(part.inline_data.data) return out # ----- Tools ----- @mcp.tool( annotations={ "title": "Generate image (Gemini 2.5 Flash Image)", "readOnlyHint": True, "openWorldHint": True, } ) def generate_image( prompt: Annotated[str, Field(description="Clear, detailed image prompt. " "Include subject, composition, action, location, style, " "and any text to render. " "Add 'Square image' or '16:9' in the text to influence aspect.")], n: Annotated[int, Field(ge=1, le=4, description="Requested image count (model may return fewer).")] = 1, negative_prompt: Annotated[Optional[str], Field(description="Things to avoid (style, objects, text).")] = None, system_instruction: Annotated[Optional[str], Field(description="Optional system tone/style.")] = None, images_b64: Annotated[Optional[list[str]], Field(description="Inline base64 input images for composition/editing.")] = None, mime_types: Annotated[Optional[list[str]], Field(description="MIME types matching images_b64.")] = None, ctx: Context = None, ) -> ToolResult: """ Generate one or more images from a text prompt, optionally conditioned on input image(s). Returns both MCP image content blocks and structured JSON with metadata. """ client = _client() contents: list = [] if system_instruction: contents.append(system_instruction) # Negative prompt is best handled as explicit constraints in the text. full_prompt = prompt if negative_prompt: full_prompt += f"\n\nConstraints (avoid): {negative_prompt}" contents.append(full_prompt) # Optional: add inline image parts (for edits/compose/style transfer) if images_b64 and mime_types: contents = _parts_from_inline_b64(images_b64, mime_types) + contents # Call Gemini (2.5 Flash Image Preview model name per docs) # Tip: number of images is governed by prompt; the SDK returns interleaved text/images. responses = [] for _ in range(n): resp = client.models.generate_content( model="gemini-2.5-flash-image", contents=contents, ) responses.append(resp) # Collect images from all responses all_imgs: list[MCPImage] = [] meta: list[dict] = [] for idx, resp in enumerate(responses, start=1): imgs = _extract_image_bytes_list(resp) # Wrap image bytes into MCP Image blocks (FastMCP base64-encodes automatically) for j, b in enumerate(imgs, start=1): all_imgs.append(MCPImage(data=b, format="png")) # format is advisory; PNG is safe default meta.append({ "response_index": idx, "image_index": j, "mime_type": "image/png", "synthid_watermark": True, # per Gemini docs, images include SynthID }) # Compose human-readable summary + structured JSON summary = ( f"Generated {len(all_imgs)} image(s) with Gemini 2.5 Flash Image from your prompt." + (" Included edits/conditioning from provided image(s)." if images_b64 else "") ) return ToolResult( # content blocks (first a short text, then the images) content=[summary] + all_imgs, # structured JSON for clients that parse data structured_content={ "requested": n, "returned": len(all_imgs), "negative_prompt_applied": bool(negative_prompt), "used_inline_images": bool(images_b64), "images": meta, }, ) @mcp.tool( annotations={"title": "Edit image (conversational)", "readOnlyHint": True, "openWorldHint": True} ) def edit_image( instruction: Annotated[str, Field(description="Conversational edit instruction. " "e.g., 'Add a knitted wizard hat to the cat.'")], base_image_b64: Annotated[str, Field(description="Base64 image to edit.")] , mime_type: Annotated[str, Field(description="MIME type, e.g., image/png or image/jpeg")] = "image/png", ctx: Context = None, ) -> ToolResult: """ Perform a precise, style-preserving edit on a single input image using a natural-language instruction. """ client = _client() import base64 raw = base64.b64decode(base_image_b64) parts = [gx.Part.from_bytes(data=raw, mime_type=mime_type), instruction] resp = client.models.generate_content( model="gemini-2.5-flash-image", contents=parts, ) imgs = _extract_image_bytes_list(resp) blocks = [MCPImage(data=b, format="png") for b in imgs] return ToolResult( content=[f"Applied edit: {instruction}"] + blocks, structured_content={ "returned": len(blocks), "synthid_watermark": True }, ) @mcp.tool( annotations={"title": "Upload file to Gemini Files API", "readOnlyHint": False, "openWorldHint": True} ) def upload_file( path: Annotated[str, Field(description="Server-accessible file path to upload to Gemini Files API.")], display_name: Annotated[Optional[str], Field(description="Optional display name.")] = None, ) -> dict: """ Upload a local file through the Gemini Files API and return its URI & metadata. Useful when the image is larger than 20MB or reused across prompts. """ client = _client() # Gemini Files API only accepts file parameter file_obj = client.files.upload(file=path) return { "uri": file_obj.uri, "name": file_obj.name, "mime_type": getattr(file_obj, "mime_type", None), "size_bytes": getattr(file_obj, "size_bytes", None), } # ----- Resources ----- @mcp.resource("gemini://files/{name}") def file_metadata(name: str) -> dict: """ Fetch Files API metadata by file 'name' (like 'files/abc123'). """ client = _client() f = client.files.get(name=name) return { "name": f.name, "uri": f.uri, "mime_type": getattr(f, "mime_type", None), "size_bytes": getattr(f, "size_bytes", None), } @mcp.resource("nano-banana://prompt-templates") def prompt_templates_catalog() -> dict: """ A compact catalog of prompt templates (same schemas as the @mcp.prompt items below). """ return { "photorealistic_shot": { "description": "High-fidelity photography template.", "parameters": ["subject", "composition", "lighting", "camera", "aspect_hint"], }, "logo_text": { "description": "Accurate text rendering in a clean logo.", "parameters": ["brand", "text", "font_style", "style_desc", "color_scheme"], }, "product_shot": { "description": "Studio product mockup for e-commerce.", "parameters": ["product", "background", "lighting_setup", "angle", "aspect_hint"], }, "sticker_flat": { "description": "Kawaii/flat sticker with bold lines and white background.", "parameters": ["character", "accessory", "palette"], }, "iterative_edit_instruction": { "description": "Concise edit instruction phrasing", "parameters": ["what_to_change", "how_it_should_blend"], }, "composition_and_style_transfer": { "description": "Blend multiple images and transfer style.", "parameters": ["target_subject", "style_reference", "style_desc"], }, } # ----- Prompts (reusable message templates) ----- @mcp.prompt def photorealistic_shot( subject: str, composition: str, lighting: str, camera: str, aspect_hint: Literal["Square image", "Portrait", "Landscape", "16:9", "4:3"] = "Square image", ) -> str: return ( f"A photorealistic {subject}. Composition: {composition}. Lighting: {lighting}. " f"Camera: {camera}. {aspect_hint}." ) @mcp.prompt def logo_text( brand: str, text: str, font_style: str, style_desc: str, color_scheme: str, ) -> str: return ( f"Create a modern, minimalist logo for {brand}. The text should read '{text}' " f"in a {font_style} font. The design should be {style_desc}. Color scheme: {color_scheme}." ) @mcp.prompt def product_shot( product: str, background: str, lighting_setup: str, angle: str, aspect_hint: str = "Square image", ) -> str: return ( f"A high-resolution, studio-lit product photograph of {product} on {background}. " f"Lighting: {lighting_setup}. Camera angle: {angle}. Ultra-realistic. {aspect_hint}." ) @mcp.prompt def sticker_flat(character: str, accessory: str, palette: str) -> str: return ( f"A kawaii-style sticker of {character} with {accessory}. " f"Bold, clean outlines, simple cel-shading, vibrant palette ({palette}). " f"Background must be white." ) @mcp.prompt def iterative_edit_instruction(what_to_change: str, how_it_should_blend: str) -> str: return ( f"Using the provided image, {what_to_change}. " f"Ensure the change {how_it_should_blend} and matches the original style, lighting, and perspective." ) @mcp.prompt def composition_and_style_transfer(target_subject: str, style_reference: str, style_desc: str) -> str: return ( f"Transform the provided photograph of {target_subject} into the style of {style_reference}. " f"Preserve composition; render with {style_desc}." ) # ----- Entrypoint ----- if __name__ == "__main__": # Default to STDIO transport (best for local MCP clients) mcp.run() ``` **Why this skeleton works well with FastMCP & Gemini** * `mcp.run()` starts the FastMCP server (STDIO by default), matching the docs’ “Running the server” guidance. ([FastMCP][1]) * Tools return **mixed content**: a short text summary + **real image blocks** (`fastmcp.Image`). FastMCP auto-converts these to MCP ImageContent; returning `bytes` or `Image` objects is the recommended way to stream binary content to clients. ([FastMCP][2]) * We optionally add **structured JSON** via `ToolResult.structured_content`, aligned with FastMCP’s structured output support. ([FastMCP][2]) * Gemini SDK usage mirrors the official “image generation” & “image understanding” recipes (inline bytes via `types.Part.from_bytes`, Files API for large/reusable assets, and reading generated image bytes from `response.candidates[0].content.parts`). ([Google AI for Developers][3]) * Prompts follow Google’s **prompting tips** (subject, composition, action/location, style, and explicit text rendering). ([blog.google][4]) * Note: All generated images include a **SynthID watermark**, which we surface in metadata. ([Google AI for Developers][5]) ## Run & connect * Start locally (STDIO default): ```bash uv sync export GEMINI_API_KEY=YOUR_KEY uv run python server.py ``` FastMCP defaults to STDIO transport; MCP clients can connect immediately. For HTTP transport, use `mcp.run(transport="http", host="127.0.0.1", port=9000)`. ([FastMCP][1]) --- ## Design notes & rationale (doc-backed) * **Gemini API usage** * **Inline images** with `Part.from_bytes` for small files; use **Files API** for large or reusable assets (the docs call out the 20MB inline limit). ([Google AI for Developers][3]) * **Extracting generated images** from `response.candidates[0].content.parts` (look for `inline_data`). ([Google AI for Developers][6]) * **SynthID watermark** included on generated images (we surface this in metadata). ([Google AI for Developers][5]) * **Prompting best practices** * Include **subject, composition, action/location, style, editing instructions** for higher fidelity and consistency; this mirrors Google’s official guidance. ([blog.google][4]) * **FastMCP specifics** * Returning `Image` (or `bytes`) creates proper MCP **image content blocks**; if you also want machine-readable output, use `ToolResult.structured_content`. ([FastMCP][2]) * `mcp.run()` is the canonical entry point; STDIO is the default transport recommended for local use. ([FastMCP][1]) --- ### What you can add next * **Auth & rate-limit guards** (wrap Gemini calls and shape exceptions; consider `mask_error_details=True` during server creation for safer errors). * **HTTP transport** + `/health` route for remote deployments. * **Caching** of Files API uploads (e.g., by checksum) to avoid duplicates. * **Resource templates** that expose previously generated outputs by `job_id` if you want a browsable history. If you’d like, I can tweak the tools/prompt catalog for your exact workflows (e.g., banners, ads, thumbnails, character packs), or wire in Vertex AI endpoints instead of the public Gemini API. [1]: https://gofastmcp.com/servers/server "The FastMCP Server - FastMCP" [2]: https://gofastmcp.com/servers/tools "Tools - FastMCP" [3]: https://ai.google.dev/gemini-api/docs/image-understanding "Image understanding | Gemini API | Google AI for Developers" [4]: https://blog.google/products/gemini/image-generation-prompting-tips/?utm_source=chatgpt.com "Gemini image generation: How to write an effective prompt" [5]: https://ai.google.dev/gemini-api/docs/image-generation?utm_source=chatgpt.com "Image generation with Gemini | Gemini API | Google AI for Developers" [6]: https://ai.google.dev/gemini-api/docs/image-generation "Image generation with Gemini (aka Nano Banana) | Gemini API | Google AI for Developers" [7]: https://gofastmcp.com/servers/resources?utm_source=chatgpt.com "Resources & Templates - FastMCP" [8]: https://gofastmcp.com/servers/prompts "Prompts - FastMCP"

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/zhongweili/nanobanana-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

fastmcp-skeleton.md•16.3 KiB