Skip to main content
Glama
Suixinlei

Tongyi Wanxiang MCP Server

by Suixinlei

wanx-t2i-image-generation

Generate detailed images from text prompts using Alibaba Cloud's Tongyi Wanxiang API. Input prompts and negative prompts to create custom visuals, with results retrieved via a separate tool.

Instructions

使用阿里云万相文生图大模型的文生图能力,由于图片生成耗时比较久,需要调用 wanx-t2i-image-generation-result 工具获取结果

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
negative_promptYes
promptYes
seedNo

Implementation Reference

  • src/index.ts:24-38 (registration)
    Registration of the wanx-t2i-image-generation MCP tool, including input schema and handler function.
    server.tool(
      "wanx-t2i-image-generation",
      "使用阿里云万相文生图大模型的文生图能力,由于图片生成耗时比较久,需要调用 wanx-t2i-image-generation-result 工具获取结果",
      { prompt: z.string(), negative_prompt: z.string(), seed: z.number().optional() },
      async ({ prompt, negative_prompt, seed }) => {
        const result = await createImageTask({
          prompt,
          negative_prompt,
          seed: seed ?? Math.floor(Math.random() * (4294967291 - 0)),
        });
        return {
          content: [{ type: "text", text: JSON.stringify(result.output) }],
        };
      }
    );
  • Zod schema defining tool inputs: prompt, negative_prompt (required strings), seed (optional number).
    { prompt: z.string(), negative_prompt: z.string(), seed: z.number().optional() },
  • Tool handler: wraps createImageTask call, generates random seed if none provided, returns task output as JSON text.
    async ({ prompt, negative_prompt, seed }) => {
      const result = await createImageTask({
        prompt,
        negative_prompt,
        seed: seed ?? Math.floor(Math.random() * (4294967291 - 0)),
      });
      return {
        content: [{ type: "text", text: JSON.stringify(result.output) }],
      };
    }
  • Core helper function that constructs and POSTs async image generation request to Aliyun DashScope Wanxiang text-to-image API.
    export const createImageTask = async ({
      prompt = "",
      negative_prompt = "",
      model = config.api.defaultModel,
      size = "1024*1024",
      n = 1,
      seed = 0,
      prompt_extend = true,
      watermark = false,
    }) => {
      try {
        const apiKey = config.api.apiKey;
    
        if (!apiKey) {
          throw new Error("API key is not configured");
        }
    
        // 构建请求体
        const requestBody = {
          model,
          input: {
            prompt,
            negative_prompt,
          },
          parameters: {
            size,
            n,
            seed,
            prompt_extend,
            watermark,
          },
        };
    
        // 添加可选参数
        if (negative_prompt) {
          requestBody.input.negative_prompt = negative_prompt;
        }
    
        if (watermark !== null) {
          requestBody.parameters.watermark = watermark;
        }
    
        // 发送请求
        const response = await axios.post(
          `${config.api.baseUrl}/services/aigc/text2image/image-synthesis`,
          requestBody,
          {
            headers: {
              "Content-Type": "application/json",
              Authorization: `Bearer ${apiKey}`,
              "X-DashScope-Async": "enable",
            },
          }
        );
    
        return response.data;
      } catch (error: any) {
        if (error.response) {
          throw new Error(error.response.data.message || "Failed to create task");
        }
        throw error;
      }
    };
  • Helper utility to poll task status using getTaskStatus until SUCCEEDED/FAILED or timeout, used by companion result tool.
    export const pollTaskUntilDone = async (taskId: string) => {
      let retries = 0;
    
      while (retries < config.maxRetries) {
        const taskData = await getTaskStatus(taskId);
        const status = taskData.output.task_status;
    
        if (status === "SUCCEEDED" || status === "FAILED") {
          return taskData;
        }
    
        // 等待一段时间后再次查询
        await new Promise((resolve) => setTimeout(resolve, config.pollingInterval));
        retries++;
      }
    
      throw new Error("Task polling timeout");
    };
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It discloses the asynchronous behavior (needs result tool) and performance characteristic (takes time), which is valuable. However, it doesn't mention permissions, rate limits, or what happens if generation fails, leaving gaps in behavioral understanding.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences that efficiently convey the core functionality and usage requirement. It's front-loaded with the main purpose and follows with critical behavioral information, with no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 3 parameters, 0% schema coverage, no annotations, and no output schema, the description is incomplete. It explains the asynchronous workflow but doesn't cover parameter meanings, error handling, or output format, leaving significant gaps for the agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It provides no information about the three parameters (prompt, negative_prompt, seed), their formats, or examples. This leaves parameters completely undocumented beyond the schema structure.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states the tool uses Alibaba Cloud's Wanxiang text-to-image model for image generation, which provides a basic purpose. However, it doesn't specify what kind of images it generates or differentiate from the video generation sibling tools, making it somewhat vague.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states that image generation takes time and requires calling 'wanx-t2i-image-generation-result' to get results, providing clear context for usage. It doesn't mention when to use this versus the video generation tools, but the time constraint guidance is helpful.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Suixinlei/tongyi-wanx-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server