Skip to main content
Glama

edit_image

Modify existing images using text prompts and optional reference images. Upload an image file and describe changes to apply visual edits.

Instructions

Edit an existing image file with a text prompt, optionally using additional reference images. Use this when you have the exact file path of an image to modify.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
imagePathYesFull file path to the image to edit
promptYesText describing the modifications to make (max 10,000 chars)
referenceImagesNoOptional array of file paths to reference images (for style transfer, adding elements, etc.)

Implementation Reference

  • The handleEditImage method is the primary handler for the edit_image tool. It parses and validates the input arguments using EditImageArgsSchema, then delegates to editImageInternal for the actual image processing logic.
    private async handleEditImage(
      request: CallToolRequest
    ): Promise<CallToolResult> {
      const parsed = EditImageArgsSchema.safeParse(request.params.arguments);
      if (!parsed.success) {
        throw new McpError(
          ErrorCode.InvalidParams,
          parsed.error.errors.map((e) => e.message).join("; ")
        );
      }
    
      const { imagePath, prompt, referenceImages } = parsed.data;
      return await this.editImageInternal(imagePath, prompt, referenceImages);
    }
  • The editImage method in GeminiClient handles the core image editing logic. It constructs the API request parts (main image, optional reference images, and prompt text), calls the Gemini API with generateContent, extracts the edited image from the response, saves it to disk, and returns the result.
    async editImage(
      imageBase64: string,
      imageMimeType: string,
      prompt: string,
      referenceImagesData?: Array<{ base64: string; mimeType: string }>
    ): Promise<ImageResult> {
      // Build parts: main image + reference images + prompt text
      const parts: Array<Record<string, unknown>> = [
        {
          inlineData: {
            data: imageBase64,
            mimeType: imageMimeType,
          },
        },
      ];
    
      if (referenceImagesData) {
        for (const ref of referenceImagesData) {
          parts.push({
            inlineData: {
              data: ref.base64,
              mimeType: ref.mimeType,
            },
          });
        }
      }
    
      parts.push({ text: prompt });
    
      const response = (await this.client.models.generateContent({
        model: this.model,
        contents: [{ parts }],
        config: {
          responseModalities: ["Text", "Image"],
        },
      })) as GeminiResponse;
    
      const { images, text } = extractImagesFromResponse(response);
    
      if (images.length === 0) {
        return {
          filePath: "",
          base64Data: "",
          mimeType: "",
          textContent: text || "No edited image was generated.",
        };
      }
    
      const firstImage = images[0];
      const filePath = await saveImage(firstImage.base64, "edited");
    
      return {
        filePath,
        base64Data: firstImage.base64,
        mimeType: firstImage.mimeType,
        textContent: text,
      };
    }
  • EditImageArgsSchema defines the input validation schema for the edit_image tool using Zod. It validates: imagePath (required string), prompt (required string, max 10,000 chars), and referenceImages (optional array of strings).
    export const EditImageArgsSchema = z.object({
      imagePath: z.string().min(1, "Image path is required"),
      prompt: z.string().min(1, "Prompt is required").max(10_000, "Prompt too long (max 10,000 chars)"),
      referenceImages: z.array(z.string()).optional(),
    });
  • src/index.ts:47-71 (registration)
    The edit_image tool is registered with its name, description, and JSON Schema inputSchema defining the expected parameters (imagePath, prompt, referenceImages). This is part of the TOOLS array exposed via the ListToolsRequestSchema handler.
    {
      name: "edit_image",
      description:
        "Edit an existing image file with a text prompt, optionally using additional reference images. Use this when you have the exact file path of an image to modify.",
      inputSchema: {
        type: "object",
        properties: {
          imagePath: {
            type: "string",
            description: "Full file path to the image to edit",
          },
          prompt: {
            type: "string",
            description: "Text describing the modifications to make (max 10,000 chars)",
          },
          referenceImages: {
            type: "array",
            items: { type: "string" },
            description:
              "Optional array of file paths to reference images (for style transfer, adding elements, etc.)",
          },
        },
        required: ["imagePath", "prompt"],
      },
    },
  • The editImageInternal method is a shared helper used by both edit_image and continue_editing tools. It validates file paths against allowed directories, reads and encodes images to base64, processes reference images, calls the Gemini client, and formats the response with the edited image.
    private async editImageInternal(
      imagePath: string,
      prompt: string,
      referenceImages?: string[]
    ): Promise<CallToolResult> {
      const allowedDirs = getAllowedDirs();
    
      // Validate main image path
      const validatedPath = validatePath(imagePath, allowedDirs);
      const imageBuffer = await readImageFile(validatedPath);
      const mimeType = getMimeType(validatedPath);
      const imageBase64 = imageBuffer.toString("base64");
    
      // Validate and read reference images
      const refData: Array<{ base64: string; mimeType: string }> = [];
      if (referenceImages && referenceImages.length > 0) {
        for (const refPath of referenceImages) {
          const validatedRef = validatePath(refPath, allowedDirs);
          const refBuffer = await readImageFile(validatedRef);
          const refMime = getMimeType(validatedRef);
          refData.push({
            base64: refBuffer.toString("base64"),
            mimeType: refMime,
          });
        }
      }
    
      const result = await this.gemini.editImage(
        imageBase64,
        mimeType,
        prompt,
        refData.length > 0 ? refData : undefined
      );
    
      if (!result.filePath) {
        return {
          content: [{ type: "text", text: result.textContent }],
        };
      }
    
      this.lastImagePath = result.filePath;
    
      const statusText = [
        `Image edited with nanobanana (${this.gemini.getModelName()})`,
        `Original: ${imagePath}`,
        `Edit: "${prompt.length > 100 ? prompt.slice(0, 100) + "..." : prompt}"`,
        referenceImages?.length
          ? `Reference images: ${referenceImages.length}`
          : null,
        result.textContent ? `Description: ${result.textContent}` : null,
        `Saved to: ${result.filePath}`,
        `Use continue_editing to make further changes.`,
      ]
        .filter(Boolean)
        .join("\n\n");
    
      return {
        content: [
          { type: "text", text: statusText },
          {
            type: "image",
            data: result.base64Data,
            mimeType: result.mimeType,
          },
        ],
      };
    }
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. While it mentions the tool edits images with a prompt and optional references, it doesn't disclose critical behavioral traits like whether this is a destructive operation (overwrites the original file?), what permissions are needed, rate limits, output format, or error conditions. For a mutation tool with zero annotation coverage, this is a significant gap.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise and well-structured: two sentences that efficiently convey purpose and usage guidelines. Every word earns its place with no redundancy or fluff. It's front-loaded with the core functionality followed by the key usage condition.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with 3 parameters, no annotations, and no output schema, the description is incomplete. It adequately covers purpose and basic usage but lacks crucial behavioral context (destructive nature, permissions, output format) and doesn't compensate for the absence of structured metadata. The agent would be left guessing about important operational aspects.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all three parameters thoroughly. The description adds minimal value beyond what's in the schema: it mentions 'text prompt' and 'reference images' but doesn't provide additional semantic context like examples, constraints beyond the schema's max length, or how reference images are used. Baseline 3 is appropriate when the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Edit an existing image file with a text prompt, optionally using additional reference images.' It specifies the verb ('edit'), resource ('existing image file'), and key mechanisms (text prompt, reference images). However, it doesn't explicitly distinguish this from sibling tools like 'continue_editing' or 'generate_image' beyond mentioning file path requirements.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear usage guidance: 'Use this when you have the exact file path of an image to modify.' This gives a specific when-to-use condition. However, it doesn't explicitly state when NOT to use it or name alternatives among sibling tools, though the context implies 'generate_image' might be for creating new images rather than editing existing ones.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/DojoCodingLabs/nanobanana-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server