Skip to main content
Glama
ccw33
by ccw33

vision_query

Analyze images to extract text via OCR, answer visual questions, detect objects, or generate descriptions using the GLM-4.5V model.

Instructions

调用 GLM-4.5V 对图片进行 OCR/问答/检测

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
pathYes图片路径或URL
promptYes查询提示词
modeNo查询模式describe
returnJsonNo是否返回JSON格式结果

Implementation Reference

  • Core handler function implementing the vision_query tool logic: loads and compresses image, builds GLM-4V API payload, makes API call, normalizes response.
    async function visionQuery(imagePath: string, prompt: string, mode: string, returnJson: boolean): Promise<VisionResult> {
      try {
        let imageBase64: string;
    
        let buffer: Buffer;
        
        if (imagePath.startsWith("data:")) {
          // Data URL 格式
          const base64Data = imagePath.split(',')[1];
          if (!base64Data) {
            throw new Error("Invalid data URL format");
          }
          buffer = Buffer.from(base64Data, 'base64');
        } else if (imagePath.startsWith("http://") || imagePath.startsWith("https://")) {
          // HTTP/HTTPS URL图片
          const response = await fetch(imagePath);
          if (!response.ok) {
            throw new Error(`Failed to fetch image: ${response.statusText}`);
          }
          buffer = Buffer.from(await response.arrayBuffer());
        } else {
          // 本地文件
          const resolvedPath = path.resolve(imagePath);
          buffer = await fs.readFile(resolvedPath);
        }
        
        // 压缩图片以减少token使用量
        const compressedBuffer = await compressImage(buffer, 800, 75); // 更小尺寸和质量
        imageBase64 = compressedBuffer.toString("base64");
    
        const payload = buildGlmPayload({
          prompt,
          imageBase64,
          mode,
          returnJson
        });
    
        const glmBaseUrl = process.env.GLM_BASE_URL || "https://open.bigmodel.cn/api/paas/v4/chat/completions";
        const glmApiKey = process.env.GLM_API_KEY;
    
        if (!glmApiKey) {
          throw new Error("GLM_API_KEY environment variable is required");
        }
    
        const response = await fetch(glmBaseUrl, {
          method: "POST",
          headers: {
            "Content-Type": "application/json",
            "Authorization": `Bearer ${glmApiKey}`
          },
          body: JSON.stringify(payload)
        });
    
        if (!response.ok) {
          throw new Error(`GLM API request failed: ${response.statusText}`);
        }
    
        const data = await response.json();
        const result = normalizeGlmResult(data, { mode, returnJson });
    
        return {
          ok: true,
          result,
          metadata: {
            mode,
            returnJson,
            timestamp: Date.now()
          }
        };
      } catch (error) {
        return {
          ok: false,
          error: error instanceof Error ? error.message : "Unknown error"
        };
      }
    }
  • src/index.ts:81-110 (registration)
    MCP tool registration for 'vision_query', including description, Zod input schema, and thin wrapper handler calling the core visionQuery function.
    mcpServer.registerTool("vision_query", {
      description: "调用 GLM-4.5V 对图片进行 OCR/问答/检测",
      inputSchema: {
        path: z.string().describe("图片路径或URL"),
        prompt: z.string().describe("查询提示词"),
        mode: z.enum(["describe", "ocr", "qa", "detect"]).default("describe").describe("查询模式"),
        returnJson: z.boolean().default(false).describe("是否返回JSON格式结果"),
      },
    }, async ({ path: imagePath, prompt, mode, returnJson }) => {
      try {
        const result = await visionQuery(imagePath, prompt, mode, returnJson);
        return {
          content: [{
            type: "text" as const,
            text: JSON.stringify(result, null, 2)
          }]
        };
      } catch (error) {
        return {
          content: [{
            type: "text" as const,
            text: JSON.stringify({
              ok: false,
              error: error instanceof Error ? error.message : "Unknown error"
            }, null, 2)
          }],
          isError: true
        };
      }
    });
  • TypeScript interface defining the output structure for visionQuery results.
    interface VisionResult {
      ok: boolean;
      result?: string | object;
      error?: string;
      metadata?: {
        mode: string;
        returnJson: boolean;
        timestamp: number;
      };
    }
  • Helper function to build the payload for GLM vision API call, customizing system prompt based on mode.
    function buildGlmPayload(opts: {
      prompt: string;
      imageBase64: string;
      mode: string;
      returnJson: boolean;
    }) {
      const { prompt, imageBase64, mode, returnJson } = opts;
      
      // 截断过长的 prompt
      const truncatedPrompt = truncatePrompt(prompt, 300);
      
      let systemPrompt = "";
      switch (mode) {
        case "ocr":
          systemPrompt = "识别图片中的文字。";
          break;
        case "qa":
          systemPrompt = "根据图片回答问题。";
          break;
        case "detect":
          systemPrompt = "识别图片中的物体。";
          break;
        default:
          systemPrompt = "描述图片内容。";
      }
    
      if (returnJson) {
        systemPrompt += "用JSON格式回答。";
      }
    
      return {
        model: "glm-4v-plus",
        messages: [
          {
            role: "system",
            content: systemPrompt
          },
          {
            role: "user",
            content: [
              {
                type: "text",
                text: truncatedPrompt
              },
              {
                type: "image_url",
                image_url: {
                  url: `data:image/jpeg;base64,${imageBase64}`
                }
              }
            ]
          }
        ],
        temperature: 0.1,
        max_tokens: 1000
      };
    }
  • Helper function to normalize GLM API response, parsing JSON if requested.
    function normalizeGlmResult(data: any, opts: { mode: string; returnJson: boolean }) {
      if (data.error) {
        throw new Error(data.error.message || "GLM API error");
      }
    
      const content = data.choices?.[0]?.message?.content || "";
      
      if (opts.returnJson) {
        try {
          return JSON.parse(content);
        } catch {
          return { text: content, parsed: false };
        }
      }
    
      return content;
    }
Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ccw33/Multimodel-MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server