MiniMax MCP JS

Official

Overview Schema Related Servers Score Discussions(1)

text_to_image

Convert text prompts into images using specified models and aspect ratios. Save generated visuals to designated directories for easy access and integration.

Instructions

Generate images based on text prompts.

Note: This tool calls MiniMax API and may incur costs. Use only when explicitly requested by the user.

Input Schema

TableJSON Schema

Name	Required	Description	Default
`aspectRatio`	No	Image aspect ratio, values: ["1:1", "16:9","4:3", "3:2", "2:3", "3:4", "9:16", "21:9"]	1:1
`model`	No	Model to use	image-01
`n`	No	Number of images to generate
`outputDirectory`	No	The directory to save the output file. `outputDirectory` is relative to `MINIMAX_MCP_BASE_PATH` (or `basePath` in config). The final save path is `${basePath}/${outputDirectory}`. For example, if `MINIMAX_MCP_BASE_PATH=~/Desktop` and `outputDirectory=workspace`, the output will be saved to `~/Desktop/workspace/`
`outputFile`	No	Path to save the generated image file, automatically generated if not provided
`prompt`	Yes	Text prompt for image generation
`promptOptimizer`	No	Whether to optimize the prompt

Implementation Reference

src/mcp-server.ts:401-465 (registration)

Registers the text_to_image MCP tool, defines the input schema using Zod, and provides the execution handler that calls ImageAPI.generateImage to perform the generation and handles the response based on resource mode.

private registerTextToImageTool(): void {
  this.server.tool(
    'text_to_image',
    'Generate images based on text prompts.\n\nNote: This tool calls MiniMax API and may incur costs. Use only when explicitly requested by the user.',
    {
      model: z.string().optional().default(DEFAULT_T2I_MODEL).describe('Model to use'),
      prompt: z.string().describe('Text prompt for image generation'),
      aspectRatio: z
        .string()
        .optional()
        .default('1:1')
        .describe('Image aspect ratio, values: ["1:1", "16:9","4:3", "3:2", "2:3", "3:4", "9:16", "21:9"]'),
      n: z.number().min(1).max(9).optional().default(1).describe('Number of images to generate'),
      promptOptimizer: z.boolean().optional().default(true).describe('Whether to optimize the prompt'),
      outputDirectory: COMMON_PARAMETERS_SCHEMA.outputDirectory,
      outputFile: z
        .string()
        .optional()
        .describe('Path to save the generated image file, automatically generated if not provided'),
    },
    async (params) => {
      try {
        // No need to update configuration from request parameters in stdio mode

        // If no output filename is provided, generate one automatically
        if (!params.outputFile) {
          const promptPrefix = params.prompt.substring(0, 20).replace(/[^\w]/g, '_');
          params.outputFile = `image_${promptPrefix}_${Date.now()}`;
        }

        const outputFiles = await this.imageApi.generateImage(params);

        // Handle different output formats
        if (this.config.resourceMode === RESOURCE_MODE_URL) {
          return {
            content: [
              {
                type: 'text',
                text: `Success. Image URL(s): ${outputFiles.join(', ')}`,
              },
            ],
          };
        } else {
          return {
            content: [
              {
                type: 'text',
                text: `Image(s) saved: ${outputFiles.join(', ')}`,
              },
            ],
          };
        }
      } catch (error) {
        return {
          content: [
            {
              type: 'text',
              text: `Failed to generate image: ${error instanceof Error ? error.message : String(error)}`,
            },
          ],
        };
      }
    },
  );
}

src/api/image.ts:17-96 (handler)

Core handler function in ImageAPI that validates input, prepares request to MiniMax /v1/image_generation endpoint, handles response URLs, downloads and saves images to local files if not in URL mode, returns file paths or URLs.

async generateImage(request: ImageGenerationRequest): Promise<string[]> {
  // Validate required parameters
  if (!request.prompt || request.prompt.trim() === '') {
    throw new MinimaxRequestError(ERROR_PROMPT_REQUIRED);
  }

  // Validate model
  const model = this.ensureValidModel(request.model);

  // Prepare request data
  const requestData: Record<string, any> = {
    model: model,
    prompt: request.prompt,
    aspect_ratio: request.aspectRatio || '1:1',
    n: request.n || 1,
    prompt_optimizer: request.promptOptimizer !== undefined ? request.promptOptimizer : true
  };

  // Only add subject reference if provided
  if (request.subjectReference) {
    // Check if it's a URL
    if (!request.subjectReference.startsWith(('http://')) &&
        !request.subjectReference.startsWith(('https://')) &&
        !request.subjectReference.startsWith(('data:'))) {
      // If it's a local file, process it as a data URL
      if (!fs.existsSync(request.subjectReference)) {
        throw new MinimaxRequestError(`Reference image file does not exist: ${request.subjectReference}`);
      }

      const imageData = fs.readFileSync(request.subjectReference);
      const base64Image = imageData.toString('base64');
      requestData.subject_reference = `data:image/jpeg;base64,${base64Image}`;
    } else {
      requestData.subject_reference = request.subjectReference;
    }
  }

  // Send request
  const response = await this.api.post<any>('/v1/image_generation', requestData);

  // Check response structure
  const imageUrls = response?.data?.image_urls;
  if (!imageUrls || !Array.isArray(imageUrls) || imageUrls.length === 0) {
    throw new MinimaxRequestError('Unable to get image URLs from response');
  }

  // If URL mode, return URLs directly
  const resourceMode = this.api.getResourceMode();
  if (resourceMode === RESOURCE_MODE_URL) {
    return imageUrls;
  }

  // Process output files
  const outputFiles: string[] = [];
  const outputDir = request.outputDirectory;

  for (let i = 0; i < imageUrls.length; i++) {
    // Generate output filename - similar to Python version
    const outputFileName = buildOutputFile(`image_${i}_${request.prompt.substring(0, 20)}`, outputDir, 'jpg', true);

    try {
      // Download image
      const imageResponse = await requests.default.get(imageUrls[i], { responseType: 'arraybuffer' });

      // Ensure directory exists
      const dirPath = path.dirname(outputFileName);
      if (!fs.existsSync(dirPath)) {
        fs.mkdirSync(dirPath, { recursive: true });
      }

      // Save file
      fs.writeFileSync(outputFileName, Buffer.from(imageResponse.data));
      outputFiles.push(outputFileName);
    } catch (error) {
      throw new MinimaxRequestError(`Failed to download or save image: ${String(error)}`);
    }
  }

  return outputFiles;
}

src/mcp-server.ts:406-419 (schema)

Zod schema defining input parameters for text_to_image tool including prompt, model, aspect ratio, number of images, etc.

model: z.string().optional().default(DEFAULT_T2I_MODEL).describe('Model to use'),
prompt: z.string().describe('Text prompt for image generation'),
aspectRatio: z
  .string()
  .optional()
  .default('1:1')
  .describe('Image aspect ratio, values: ["1:1", "16:9","4:3", "3:2", "2:3", "3:4", "9:16", "21:9"]'),
n: z.number().min(1).max(9).optional().default(1).describe('Number of images to generate'),
promptOptimizer: z.boolean().optional().default(true).describe('Whether to optimize the prompt'),
outputDirectory: COMMON_PARAMETERS_SCHEMA.outputDirectory,
outputFile: z
  .string()
  .optional()
  .describe('Path to save the generated image file, automatically generated if not provided'),

src/mcp-rest-server.ts:277-300 (registration)

Static registration of text_to_image tool schema in REST server's listTools handler.

name: 'text_to_image',
description: 'Generate image based on text prompt',
arguments: [
  { name: 'prompt', description: 'Text prompt for image generation', required: true },
  { name: 'model', description: 'Model to use', required: false },
  { name: 'aspectRatio', description: 'Image aspect ratio, values: ["1:1", "16:9","4:3", "3:2", "2:3", "3:4", "9:16", "21:9"]', required: false },
  { name: 'n', description: 'Number of images to generate (1-9)', required: false },
  { name: 'promptOptimizer', description: 'Whether to optimize prompt', required: false },
  { name: 'outputDirectory', description: OUTPUT_DIRECTORY_DESCRIPTION, required: false },
  { name: 'outputFile', description: 'Output file path, auto-generated if not provided', required: false }
],
inputSchema: {
  type: 'object',
  properties: {
    prompt: { type: 'string' },
    model: { type: 'string' },
    aspectRatio: { type: 'string' },
    n: { type: 'number' },
    promptOptimizer: { type: 'boolean' },
    outputDirectory: { type: 'string' },
    outputFile: { type: 'string' }
  },
  required: ['prompt']
}

src/mcp-rest-server.ts:546-559 (helper)

Handler in REST server that delegates text_to_image execution to MediaService with retry logic.

private async handleTextToImage(args: any, api: MiniMaxAPI, mediaService: MediaService, attempt = 1): Promise<any> {
  try {
    // Call media service to handle request
    const result = await mediaService.generateImage(args);
    return result;
  } catch (error) {
    if (attempt < MAX_RETRY_ATTEMPTS) {
      // console.warn(`[${new Date().toISOString()}] Failed to generate image, attempting retry (${attempt}/${MAX_RETRY_ATTEMPTS})`, error);
      // Delay retry
      await new Promise(resolve => setTimeout(resolve, RETRY_DELAY * Math.pow(2, attempt - 1)));
      return this.handleTextToImage(args, api, mediaService, attempt + 1);
    }
    throw this.wrapError('Failed to generate image', error);
  }

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It adds valuable context beyond basic functionality: it discloses that the tool 'calls MiniMax API and may incur costs,' which is critical operational information not inferable from the schema. However, it doesn't mention rate limits, error handling, or output format details.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise and well-structured: a clear purpose statement followed by a critical note about API costs and usage restriction. Both sentences earn their place, with zero wasted words, and the most important information (cost warning) is front-loaded in the note.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (7 parameters, API integration, cost implications) and no annotations or output schema, the description does well by covering the core purpose and critical behavioral context (API costs). However, it lacks details about the generated output (e.g., file format, resolution) and doesn't explain error cases or authentication requirements.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description provides no parameter-specific information beyond the general 'text prompts' reference. However, with 100% schema description coverage, all 7 parameters are well-documented in the schema itself (e.g., 'prompt' for text input, 'aspectRatio' with enumerated values). The baseline score of 3 is appropriate since the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Generate images based on text prompts.' This specifies the verb ('generate') and resource ('images') with the mechanism ('based on text prompts'). However, it doesn't explicitly differentiate from sibling tools like 'generate_video' or 'image_to_video' beyond the obvious difference in output type.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use the tool: 'Use only when explicitly requested by the user.' This gives explicit guidance on user-driven invocation. However, it doesn't mention alternatives or when-not-to-use scenarios relative to siblings like 'generate_video' for video generation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Related Tools

kobold_txt2img
@PhialsBasement/KoboldCPP-MCP-Server
generate_image
@bobtista/luma-ai-mcp-server
text_to_imageA
@yunwoong7/aws-nova-canvas-mcp
generate-ai-image
@DumplingAI/mcp-server-dumplingai
generate-image
@mikeyny/ai-image-gen-mcp
quickImage
@frankdeno/flux-image-generator-mcp

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/MiniMax-AI/MiniMax-MCP-JS'

If you have feedback or need assistance with the MCP directory API, please join our Discord server