Skip to main content
Glama

generate_image

Create custom images from text descriptions using AI generation. Enter a prompt to produce original visual content for projects and designs.

Instructions

Generate a NEW image from text prompt. Use this ONLY when creating a completely new image, not when modifying an existing one.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
promptYesText prompt describing the NEW image to create from scratch

Implementation Reference

  • The primary handler function for the generate_image tool. It validates configuration, calls the Gemini API to generate an image from the prompt, processes the response (saving the image file and extracting inline data), builds a formatted text response with file paths and instructions, and returns the MCP tool result.
    private async generateImage(request: CallToolRequest): Promise<CallToolResult> {
      if (!this.ensureConfigured()) {
        throw new McpError(ErrorCode.InvalidRequest, "Gemini API token not configured. Use configure_gemini_token first.");
      }
    
      const { prompt } = request.params.arguments as { prompt: string };
      
      try {
        const response = await this.genAI!.models.generateContent({
          model: "gemini-2.5-flash-image-preview",
          contents: prompt,
        });
        
        // Process response to extract image data
        const content: any[] = [];
        const savedFiles: string[] = [];
        let textContent = "";
        
        // Get appropriate save directory based on OS
        const imagesDir = this.getImagesDirectory();
        
        // Create directory
        await fs.mkdir(imagesDir, { recursive: true, mode: 0o755 });
        
        if (response.candidates && response.candidates[0]?.content?.parts) {
          for (const part of response.candidates[0].content.parts) {
            // Process text content
            if (part.text) {
              textContent += part.text;
            }
            
            // Process image data
            if (part.inlineData?.data) {
              const timestamp = new Date().toISOString().replace(/[:.]/g, '-');
              const randomId = Math.random().toString(36).substring(2, 8);
              const fileName = `generated-${timestamp}-${randomId}.png`;
              const filePath = path.join(imagesDir, fileName);
              
              const imageBuffer = Buffer.from(part.inlineData.data, 'base64');
              await fs.writeFile(filePath, imageBuffer);
              savedFiles.push(filePath);
              this.lastImagePath = filePath;
              
              // Add image to MCP response
              content.push({
                type: "image",
                data: part.inlineData.data,
                mimeType: part.inlineData.mimeType || "image/png",
              });
            }
          }
        }
        
        // Build response content
        let statusText = `šŸŽØ Image generated with nano-banana (Gemini 2.5 Flash Image)!\n\nPrompt: "${prompt}"`;
        
        if (textContent) {
          statusText += `\n\nDescription: ${textContent}`;
        }
        
        if (savedFiles.length > 0) {
          statusText += `\n\nšŸ“ Image saved to:\n${savedFiles.map(f => `- ${f}`).join('\n')}`;
          statusText += `\n\nšŸ’” View the image by:`;
          statusText += `\n1. Opening the file at the path above`;
          statusText += `\n2. Clicking on "Called generate_image" in Cursor to expand the MCP call details`;
          statusText += `\n\nšŸ”„ To modify this image, use: continue_editing`;
          statusText += `\nšŸ“‹ To check current image info, use: get_last_image_info`;
        } else {
          statusText += `\n\nNote: No image was generated. The model may have returned only text.`;
          statusText += `\n\nšŸ’” Tip: Try running the command again - sometimes the first call needs to warm up the model.`;
        }
        
        // Add text content first
        content.unshift({
          type: "text",
          text: statusText,
        });
        
        return { content };
        
      } catch (error) {
        console.error("Error generating image:", error);
        throw new McpError(
          ErrorCode.InternalError,
          `Failed to generate image: ${error instanceof Error ? error.message : String(error)}`
        );
      }
    }
  • src/index.ts:71-84 (registration)
    Registration of the generate_image tool in the ListToolsRequestSchema handler, including name, description, and input schema.
    {
      name: "generate_image",
      description: "Generate a NEW image from text prompt. Use this ONLY when creating a completely new image, not when modifying an existing one.",
      inputSchema: {
        type: "object",
        properties: {
          prompt: {
            type: "string",
            description: "Text prompt describing the NEW image to create from scratch",
          },
        },
        required: ["prompt"],
      },
    },
  • Input schema definition for the generate_image tool, specifying a required 'prompt' string parameter.
    inputSchema: {
      type: "object",
      properties: {
        prompt: {
          type: "string",
          description: "Text prompt describing the NEW image to create from scratch",
        },
      },
      required: ["prompt"],
    },
  • Dispatcher case in the CallToolRequestSchema handler that routes generate_image calls to the generateImage method.
    case "generate_image":
      return await this.generateImage(request);
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses that this creates new content ('generate a NEW image') but doesn't mention behavioral traits like rate limits, quality expectations, generation time, or output format. The description adds some context about the creation scope but lacks operational details.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with zero waste. The first sentence states the core purpose, and the second provides critical usage guidance. Every word earns its place, and the structure is front-loaded with essential information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a single-parameter tool with no annotations and no output schema, the description is adequate but has gaps. It covers purpose and usage boundaries well, but doesn't address what the tool returns (image data, URL, metadata) or any operational constraints. The context is partially complete but lacks output information.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents the single 'prompt' parameter. The description adds minimal value by reinforcing that the prompt should describe 'the NEW image to create from scratch', which slightly expands on the schema's description. Baseline 3 is appropriate when schema does most of the work.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('generate') and resource ('image') with specific scope ('from text prompt'). It explicitly distinguishes from siblings by stating 'not when modifying an existing one', which differentiates it from tools like 'edit_image'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage guidance with 'Use this ONLY when creating a completely new image, not when modifying an existing one.' This clearly defines when to use this tool versus alternatives like 'edit_image' and establishes clear boundaries.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ConechoAI/Nano-Banana-MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server