DALL-E MCP Server

generate_image

Create custom images from text descriptions using OpenAI's DALL-E 3 model with configurable size, quality, and style options for visual content generation.

Instructions

Generate an image using OpenAI's DALL-E 3 model based on a text prompt

Input Schema

TableJSON Schema

Name	Required	Description	Default
`prompt`	Yes	The text prompt describing the image to generate
`size`	No	Image size (1024x1024, 1024x1792, or 1792x1024)	1792x1024
`quality`	No	Image quality (standard or hd)	hd
`style`	No	Image style (vivid or natural)	vivid
`filename`	No	Optional custom filename (without extension)

Implementation Reference

src/index.ts:106-182 (handler)

The primary handler function that executes the generate_image tool. It calls the OpenAI DALL-E API, downloads the generated image, saves it locally, and returns a JSON result.

private async handleImageGeneration(args: any) {
  try {
    const {
      prompt,
      size = DEFAULT_SIZE,
      quality = DEFAULT_QUALITY,
      style = "vivid",
      filename,
    } = args;

    if (!process.env.OPENAI_API_KEY) {
      throw new Error("OPENAI_API_KEY environment variable is required");
    }

    console.log(`[DALL-E] Generating image: "${prompt.slice(0, 50)}..."`);

    const response = await openai.images.generate({
      model: "dall-e-3",
      prompt: prompt,
      n: 1,
      size: size as "1024x1024" | "1024x1792" | "1792x1024",
      quality: quality as "standard" | "hd",
      style: style as "vivid" | "natural",
      response_format: "url",
    });

    if (!response.data || !response.data[0]?.url) {
      throw new Error("No image URL received from DALL-E API");
    }
    
    const imageUrl = response.data[0].url;

    const imageResponse = await fetch(imageUrl);
    if (!imageResponse.ok) {
      throw new Error(`Failed to download image: ${imageResponse.statusText}`);
    }

    const imageBuffer = Buffer.from(await imageResponse.arrayBuffer());

    await fs.ensureDir(OUTPUT_DIR);

    const timestamp = new Date().toISOString().replace(/[:.]/g, "-");
    const baseFilename = filename || `dalle_${timestamp}`;
    const imagePath = path.join(OUTPUT_DIR, `${baseFilename}.png`);

    await fs.writeFile(imagePath, imageBuffer);

    const result = {
      success: true,
      message: "Image generated successfully",
      details: {
        prompt: prompt,
        size: size,
        quality: quality,
        style: style,
        file_path: path.resolve(imagePath),
        file_size: imageBuffer.length,
        timestamp: new Date().toISOString(),
      },
    };

    console.log(`[DALL-E] Image saved: ${imagePath}`);
    return { content: [{ type: "text", text: JSON.stringify(result, null, 2) }] };

  } catch (error) {
    const errorMessage = error instanceof Error ? error.message : "Unknown error occurred";
    console.error(`[DALL-E Error] ${errorMessage}`);

    const errorResult = {
      success: false,
      error: errorMessage,
      timestamp: new Date().toISOString(),
    };

    return { content: [{ type: "text", text: JSON.stringify(errorResult, null, 2) }] };
  }
}

src/index.ts:60-91 (schema)

Defines the input schema and parameters for the generate_image tool, including required prompt and optional size, quality, style, filename.

inputSchema: {
  type: "object",
  properties: {
    prompt: {
      type: "string",
      description: "The text prompt describing the image to generate",
    },
    size: {
      type: "string",
      description: "Image size (1024x1024, 1024x1792, or 1792x1024)",
      enum: ["1024x1024", "1024x1792", "1792x1024"],
      default: DEFAULT_SIZE,
    },
    quality: {
      type: "string",
      description: "Image quality (standard or hd)",
      enum: ["standard", "hd"],
      default: DEFAULT_QUALITY,
    },
    style: {
      type: "string",
      description: "Image style (vivid or natural)",
      enum: ["vivid", "natural"],
      default: "vivid",
    },
    filename: {
      type: "string",
      description: "Optional custom filename (without extension)",
    },
  },
  required: ["prompt"],
},

src/index.ts:53-95 (registration)

Registers the generate_image tool in the ListTools response, providing its name, description, and schema.

this.server.setRequestHandler(ListToolsRequestSchema, async () => {
  return {
    tools: [
      {
        name: "generate_image",
        description:
          "Generate an image using OpenAI's DALL-E 3 model based on a text prompt",
        inputSchema: {
          type: "object",
          properties: {
            prompt: {
              type: "string",
              description: "The text prompt describing the image to generate",
            },
            size: {
              type: "string",
              description: "Image size (1024x1024, 1024x1792, or 1792x1024)",
              enum: ["1024x1024", "1024x1792", "1792x1024"],
              default: DEFAULT_SIZE,
            },
            quality: {
              type: "string",
              description: "Image quality (standard or hd)",
              enum: ["standard", "hd"],
              default: DEFAULT_QUALITY,
            },
            style: {
              type: "string",
              description: "Image style (vivid or natural)",
              enum: ["vivid", "natural"],
              default: "vivid",
            },
            filename: {
              type: "string",
              description: "Optional custom filename (without extension)",
            },
          },
          required: ["prompt"],
        },
      },
    ],
  };
});

src/index.ts:97-103 (registration)

Sets up the CallTool request handler that dispatches to generate_image handler based on tool name.

this.server.setRequestHandler(CallToolRequestSchema, async (request) => {
  if (request.params.name === "generate_image") {
    return await this.handleImageGeneration(request.params.arguments);
  }

  throw new Error(`Unknown tool: ${request.params.name}`);
});

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions the model (DALL-E 3) but doesn't cover critical aspects like rate limits, authentication needs, cost implications, or what happens on failure (e.g., if the prompt violates content policies). For a tool that likely involves API calls and potential restrictions, this is a significant gap.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose without any wasted words. It directly states what the tool does and the technology used, making it easy to understand at a glance. Every part of the sentence earns its place by providing essential context.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of an image generation tool with no annotations and no output schema, the description is insufficient. It doesn't explain the return value (e.g., image URL or data), error handling, or usage limits. For a tool that interacts with an external API and has multiple parameters, more context is needed to ensure proper agent usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, with all parameters well-documented, including enums and defaults. The description adds no additional parameter semantics beyond what the schema provides, such as explaining prompt best practices or style/quality trade-offs. However, with high schema coverage, the baseline is 3, as the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with a specific verb ('generate') and resource ('image'), and identifies the underlying model (DALL-E 3). It distinguishes the action from potential alternatives by specifying it's for image generation from text prompts. However, without sibling tools, it doesn't need to differentiate from them, so it doesn't reach the highest score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention any prerequisites, constraints, or scenarios where this tool is preferred over other image generation methods. The only implied usage is for generating images from text prompts, but this is basic and lacks explicit context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

generate_imageC

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/szabadkai/imagegen-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server