vision_query
Analyze images to extract text via OCR, answer visual questions, detect objects, or generate descriptions using the GLM-4.5V model.
Instructions
调用 GLM-4.5V 对图片进行 OCR/问答/检测
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| path | Yes | 图片路径或URL | |
| prompt | Yes | 查询提示词 | |
| mode | No | 查询模式 | describe |
| returnJson | No | 是否返回JSON格式结果 |
Implementation Reference
- src/index.ts:229-304 (handler)Core handler function implementing the vision_query tool logic: loads and compresses image, builds GLM-4V API payload, makes API call, normalizes response.async function visionQuery(imagePath: string, prompt: string, mode: string, returnJson: boolean): Promise<VisionResult> { try { let imageBase64: string; let buffer: Buffer; if (imagePath.startsWith("data:")) { // Data URL 格式 const base64Data = imagePath.split(',')[1]; if (!base64Data) { throw new Error("Invalid data URL format"); } buffer = Buffer.from(base64Data, 'base64'); } else if (imagePath.startsWith("http://") || imagePath.startsWith("https://")) { // HTTP/HTTPS URL图片 const response = await fetch(imagePath); if (!response.ok) { throw new Error(`Failed to fetch image: ${response.statusText}`); } buffer = Buffer.from(await response.arrayBuffer()); } else { // 本地文件 const resolvedPath = path.resolve(imagePath); buffer = await fs.readFile(resolvedPath); } // 压缩图片以减少token使用量 const compressedBuffer = await compressImage(buffer, 800, 75); // 更小尺寸和质量 imageBase64 = compressedBuffer.toString("base64"); const payload = buildGlmPayload({ prompt, imageBase64, mode, returnJson }); const glmBaseUrl = process.env.GLM_BASE_URL || "https://open.bigmodel.cn/api/paas/v4/chat/completions"; const glmApiKey = process.env.GLM_API_KEY; if (!glmApiKey) { throw new Error("GLM_API_KEY environment variable is required"); } const response = await fetch(glmBaseUrl, { method: "POST", headers: { "Content-Type": "application/json", "Authorization": `Bearer ${glmApiKey}` }, body: JSON.stringify(payload) }); if (!response.ok) { throw new Error(`GLM API request failed: ${response.statusText}`); } const data = await response.json(); const result = normalizeGlmResult(data, { mode, returnJson }); return { ok: true, result, metadata: { mode, returnJson, timestamp: Date.now() } }; } catch (error) { return { ok: false, error: error instanceof Error ? error.message : "Unknown error" }; } }
- src/index.ts:81-110 (registration)MCP tool registration for 'vision_query', including description, Zod input schema, and thin wrapper handler calling the core visionQuery function.mcpServer.registerTool("vision_query", { description: "调用 GLM-4.5V 对图片进行 OCR/问答/检测", inputSchema: { path: z.string().describe("图片路径或URL"), prompt: z.string().describe("查询提示词"), mode: z.enum(["describe", "ocr", "qa", "detect"]).default("describe").describe("查询模式"), returnJson: z.boolean().default(false).describe("是否返回JSON格式结果"), }, }, async ({ path: imagePath, prompt, mode, returnJson }) => { try { const result = await visionQuery(imagePath, prompt, mode, returnJson); return { content: [{ type: "text" as const, text: JSON.stringify(result, null, 2) }] }; } catch (error) { return { content: [{ type: "text" as const, text: JSON.stringify({ ok: false, error: error instanceof Error ? error.message : "Unknown error" }, null, 2) }], isError: true }; } });
- src/index.ts:25-34 (schema)TypeScript interface defining the output structure for visionQuery results.interface VisionResult { ok: boolean; result?: string | object; error?: string; metadata?: { mode: string; returnJson: boolean; timestamp: number; }; }
- src/index.ts:374-430 (helper)Helper function to build the payload for GLM vision API call, customizing system prompt based on mode.function buildGlmPayload(opts: { prompt: string; imageBase64: string; mode: string; returnJson: boolean; }) { const { prompt, imageBase64, mode, returnJson } = opts; // 截断过长的 prompt const truncatedPrompt = truncatePrompt(prompt, 300); let systemPrompt = ""; switch (mode) { case "ocr": systemPrompt = "识别图片中的文字。"; break; case "qa": systemPrompt = "根据图片回答问题。"; break; case "detect": systemPrompt = "识别图片中的物体。"; break; default: systemPrompt = "描述图片内容。"; } if (returnJson) { systemPrompt += "用JSON格式回答。"; } return { model: "glm-4v-plus", messages: [ { role: "system", content: systemPrompt }, { role: "user", content: [ { type: "text", text: truncatedPrompt }, { type: "image_url", image_url: { url: `data:image/jpeg;base64,${imageBase64}` } } ] } ], temperature: 0.1, max_tokens: 1000 }; }
- src/index.ts:432-448 (helper)Helper function to normalize GLM API response, parsing JSON if requested.function normalizeGlmResult(data: any, opts: { mode: string; returnJson: boolean }) { if (data.error) { throw new Error(data.error.message || "GLM API error"); } const content = data.choices?.[0]?.message?.content || ""; if (opts.returnJson) { try { return JSON.parse(content); } catch { return { text: content, parsed: false }; } } return content; }