Skip to main content
Glama

text_to_image

Convert text prompts into images using specified models and aspect ratios. Save generated visuals to designated directories for easy access and integration.

Instructions

Generate images based on text prompts.

Note: This tool calls MiniMax API and may incur costs. Use only when explicitly requested by the user.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
aspectRatioNoImage aspect ratio, values: ["1:1", "16:9","4:3", "3:2", "2:3", "3:4", "9:16", "21:9"]1:1
modelNoModel to useimage-01
nNoNumber of images to generate
outputDirectoryNoThe directory to save the output file. `outputDirectory` is relative to `MINIMAX_MCP_BASE_PATH` (or `basePath` in config). The final save path is `${basePath}/${outputDirectory}`. For example, if `MINIMAX_MCP_BASE_PATH=~/Desktop` and `outputDirectory=workspace`, the output will be saved to `~/Desktop/workspace/`
outputFileNoPath to save the generated image file, automatically generated if not provided
promptYesText prompt for image generation
promptOptimizerNoWhether to optimize the prompt

Implementation Reference

  • Registers the text_to_image MCP tool, defines the input schema using Zod, and provides the execution handler that calls ImageAPI.generateImage to perform the generation and handles the response based on resource mode.
    private registerTextToImageTool(): void {
      this.server.tool(
        'text_to_image',
        'Generate images based on text prompts.\n\nNote: This tool calls MiniMax API and may incur costs. Use only when explicitly requested by the user.',
        {
          model: z.string().optional().default(DEFAULT_T2I_MODEL).describe('Model to use'),
          prompt: z.string().describe('Text prompt for image generation'),
          aspectRatio: z
            .string()
            .optional()
            .default('1:1')
            .describe('Image aspect ratio, values: ["1:1", "16:9","4:3", "3:2", "2:3", "3:4", "9:16", "21:9"]'),
          n: z.number().min(1).max(9).optional().default(1).describe('Number of images to generate'),
          promptOptimizer: z.boolean().optional().default(true).describe('Whether to optimize the prompt'),
          outputDirectory: COMMON_PARAMETERS_SCHEMA.outputDirectory,
          outputFile: z
            .string()
            .optional()
            .describe('Path to save the generated image file, automatically generated if not provided'),
        },
        async (params) => {
          try {
            // No need to update configuration from request parameters in stdio mode
    
            // If no output filename is provided, generate one automatically
            if (!params.outputFile) {
              const promptPrefix = params.prompt.substring(0, 20).replace(/[^\w]/g, '_');
              params.outputFile = `image_${promptPrefix}_${Date.now()}`;
            }
    
            const outputFiles = await this.imageApi.generateImage(params);
    
            // Handle different output formats
            if (this.config.resourceMode === RESOURCE_MODE_URL) {
              return {
                content: [
                  {
                    type: 'text',
                    text: `Success. Image URL(s): ${outputFiles.join(', ')}`,
                  },
                ],
              };
            } else {
              return {
                content: [
                  {
                    type: 'text',
                    text: `Image(s) saved: ${outputFiles.join(', ')}`,
                  },
                ],
              };
            }
          } catch (error) {
            return {
              content: [
                {
                  type: 'text',
                  text: `Failed to generate image: ${error instanceof Error ? error.message : String(error)}`,
                },
              ],
            };
          }
        },
      );
    }
  • Core handler function in ImageAPI that validates input, prepares request to MiniMax /v1/image_generation endpoint, handles response URLs, downloads and saves images to local files if not in URL mode, returns file paths or URLs.
    async generateImage(request: ImageGenerationRequest): Promise<string[]> {
      // Validate required parameters
      if (!request.prompt || request.prompt.trim() === '') {
        throw new MinimaxRequestError(ERROR_PROMPT_REQUIRED);
      }
    
      // Validate model
      const model = this.ensureValidModel(request.model);
    
      // Prepare request data
      const requestData: Record<string, any> = {
        model: model,
        prompt: request.prompt,
        aspect_ratio: request.aspectRatio || '1:1',
        n: request.n || 1,
        prompt_optimizer: request.promptOptimizer !== undefined ? request.promptOptimizer : true
      };
    
      // Only add subject reference if provided
      if (request.subjectReference) {
        // Check if it's a URL
        if (!request.subjectReference.startsWith(('http://')) &&
            !request.subjectReference.startsWith(('https://')) &&
            !request.subjectReference.startsWith(('data:'))) {
          // If it's a local file, process it as a data URL
          if (!fs.existsSync(request.subjectReference)) {
            throw new MinimaxRequestError(`Reference image file does not exist: ${request.subjectReference}`);
          }
    
          const imageData = fs.readFileSync(request.subjectReference);
          const base64Image = imageData.toString('base64');
          requestData.subject_reference = `data:image/jpeg;base64,${base64Image}`;
        } else {
          requestData.subject_reference = request.subjectReference;
        }
      }
    
      // Send request
      const response = await this.api.post<any>('/v1/image_generation', requestData);
    
      // Check response structure
      const imageUrls = response?.data?.image_urls;
      if (!imageUrls || !Array.isArray(imageUrls) || imageUrls.length === 0) {
        throw new MinimaxRequestError('Unable to get image URLs from response');
      }
    
      // If URL mode, return URLs directly
      const resourceMode = this.api.getResourceMode();
      if (resourceMode === RESOURCE_MODE_URL) {
        return imageUrls;
      }
    
      // Process output files
      const outputFiles: string[] = [];
      const outputDir = request.outputDirectory;
    
      for (let i = 0; i < imageUrls.length; i++) {
        // Generate output filename - similar to Python version
        const outputFileName = buildOutputFile(`image_${i}_${request.prompt.substring(0, 20)}`, outputDir, 'jpg', true);
    
        try {
          // Download image
          const imageResponse = await requests.default.get(imageUrls[i], { responseType: 'arraybuffer' });
    
          // Ensure directory exists
          const dirPath = path.dirname(outputFileName);
          if (!fs.existsSync(dirPath)) {
            fs.mkdirSync(dirPath, { recursive: true });
          }
    
          // Save file
          fs.writeFileSync(outputFileName, Buffer.from(imageResponse.data));
          outputFiles.push(outputFileName);
        } catch (error) {
          throw new MinimaxRequestError(`Failed to download or save image: ${String(error)}`);
        }
      }
    
      return outputFiles;
    }
  • Zod schema defining input parameters for text_to_image tool including prompt, model, aspect ratio, number of images, etc.
    model: z.string().optional().default(DEFAULT_T2I_MODEL).describe('Model to use'),
    prompt: z.string().describe('Text prompt for image generation'),
    aspectRatio: z
      .string()
      .optional()
      .default('1:1')
      .describe('Image aspect ratio, values: ["1:1", "16:9","4:3", "3:2", "2:3", "3:4", "9:16", "21:9"]'),
    n: z.number().min(1).max(9).optional().default(1).describe('Number of images to generate'),
    promptOptimizer: z.boolean().optional().default(true).describe('Whether to optimize the prompt'),
    outputDirectory: COMMON_PARAMETERS_SCHEMA.outputDirectory,
    outputFile: z
      .string()
      .optional()
      .describe('Path to save the generated image file, automatically generated if not provided'),
  • Static registration of text_to_image tool schema in REST server's listTools handler.
    name: 'text_to_image',
    description: 'Generate image based on text prompt',
    arguments: [
      { name: 'prompt', description: 'Text prompt for image generation', required: true },
      { name: 'model', description: 'Model to use', required: false },
      { name: 'aspectRatio', description: 'Image aspect ratio, values: ["1:1", "16:9","4:3", "3:2", "2:3", "3:4", "9:16", "21:9"]', required: false },
      { name: 'n', description: 'Number of images to generate (1-9)', required: false },
      { name: 'promptOptimizer', description: 'Whether to optimize prompt', required: false },
      { name: 'outputDirectory', description: OUTPUT_DIRECTORY_DESCRIPTION, required: false },
      { name: 'outputFile', description: 'Output file path, auto-generated if not provided', required: false }
    ],
    inputSchema: {
      type: 'object',
      properties: {
        prompt: { type: 'string' },
        model: { type: 'string' },
        aspectRatio: { type: 'string' },
        n: { type: 'number' },
        promptOptimizer: { type: 'boolean' },
        outputDirectory: { type: 'string' },
        outputFile: { type: 'string' }
      },
      required: ['prompt']
    }
  • Handler in REST server that delegates text_to_image execution to MediaService with retry logic.
    private async handleTextToImage(args: any, api: MiniMaxAPI, mediaService: MediaService, attempt = 1): Promise<any> {
      try {
        // Call media service to handle request
        const result = await mediaService.generateImage(args);
        return result;
      } catch (error) {
        if (attempt < MAX_RETRY_ATTEMPTS) {
          // console.warn(`[${new Date().toISOString()}] Failed to generate image, attempting retry (${attempt}/${MAX_RETRY_ATTEMPTS})`, error);
          // Delay retry
          await new Promise(resolve => setTimeout(resolve, RETRY_DELAY * Math.pow(2, attempt - 1)));
          return this.handleTextToImage(args, api, mediaService, attempt + 1);
        }
        throw this.wrapError('Failed to generate image', error);
      }
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It adds valuable context beyond basic functionality: it discloses that the tool 'calls MiniMax API and may incur costs,' which is critical operational information not inferable from the schema. However, it doesn't mention rate limits, error handling, or output format details.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise and well-structured: a clear purpose statement followed by a critical note about API costs and usage restriction. Both sentences earn their place, with zero wasted words, and the most important information (cost warning) is front-loaded in the note.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (7 parameters, API integration, cost implications) and no annotations or output schema, the description does well by covering the core purpose and critical behavioral context (API costs). However, it lacks details about the generated output (e.g., file format, resolution) and doesn't explain error cases or authentication requirements.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description provides no parameter-specific information beyond the general 'text prompts' reference. However, with 100% schema description coverage, all 7 parameters are well-documented in the schema itself (e.g., 'prompt' for text input, 'aspectRatio' with enumerated values). The baseline score of 3 is appropriate since the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Generate images based on text prompts.' This specifies the verb ('generate') and resource ('images') with the mechanism ('based on text prompts'). However, it doesn't explicitly differentiate from sibling tools like 'generate_video' or 'image_to_video' beyond the obvious difference in output type.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use the tool: 'Use only when explicitly requested by the user.' This gives explicit guidance on user-driven invocation. However, it doesn't mention alternatives or when-not-to-use scenarios relative to siblings like 'generate_video' for video generation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/MiniMax-AI/MiniMax-MCP-JS'

If you have feedback or need assistance with the MCP directory API, please join our Discord server