Skip to main content
Glama

image_understand

Analyze images from local files, URLs, or chat attachments. Provide detailed understanding based on your question or instruction about the image.

Instructions

图像理解工具:

  • 何时调用:当用户提到“看图、看截图、看看这张图片/界面/页面/报错/架构/布局/组件结构/页面结构”等需求,或者在对话中出现图片附件并询问与图片内容相关的问题(包括 UI/前端界面结构、代码截图、日志/报错截图、文档截图、表单、表格等),都应优先调用本工具,而不是只用文本推理。

  • 图片来源:1) 用户粘贴图片时直接调用,无需手动指定路径 2) 指定本地图片路径,如 ./screenshot.png 3) 指定图片 URL,如 https://example.com/image.png。

  • 提示词(prompt)约定:

    • 不要在调用本工具前自己构造一大段复杂分析提示词;

    • 直接把“用户关于图片的原始问题/指令”作为 prompt 传入即可,例如:

      • “这张图是什么界面?整体结构是什么样的?”

      • “帮我从前端实现角度拆解这个页面的布局和组件结构”;

    • Luma 会在服务器内部自动拼接系统级视觉说明和分析模板,调用底层视觉模型完成完整理解;

    • 你只需要确保 prompt 准确表达用户对这张图想了解的内容,无需重复描述图片细节或编写长篇提示词。

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
promptYes用户关于图片的原始问题或简短指令,例如“这张图是什么界面?”、“帮我分析这个页面的结构和布局”。服务器会在内部补充系统级视觉提示词并构造完整分析指令。
image_sourceYes要分析的图片来源:支持三种方式 1) 用户粘贴图片时由Claude Desktop自动提供路径 2) 本地文件路径,如./screenshot.png 3) HTTP(S)图片URL,如https://example.com/image.png(支持 PNG、JPG、JPEG、WebP、GIF,最大 10MB)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description explains that the server internally adds system vision prompts and calls a vision model, disclosing the processing flow. It also specifies image source methods and file constraints. Missing details on rate limits or authorization are acceptable for a read-like tool. The description is transparent about its operation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with sections and bullet points, making it easy to scan. While it is somewhat verbose, each sentence serves a purpose (e.g., calling scenarios, source types, prompt rules). Minor redundancy could be trimmed, but overall it balances detail and clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has no output schema, yet the description does not specify what the agent can expect as a result (e.g., text analysis, structured data). This omission leaves the agent unsure about the return value, which is critical for invocation decisions. The description covers input well but neglects output behavior.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, setting baseline at 3. The description adds value by elaborating on `image_source` (three methods, supported formats, size limit) and `prompt` (keep it simple, raw user query). These details enhance understanding beyond the schema's minimal descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool as an image understanding tool with specific use cases (UI, code screenshots, etc.). It provides explicit criteria for when to invoke it ('用户提到“看图、看截图...') and emphasizes preference over text-only reasoning, making the purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes a dedicated '何时调用' section with concrete examples of user requests. It also advises against constructing complex prompts, directing to pass the user's raw question. Although no exclusion cases are listed, the tool's scope is well-defined and sufficient for the agent to decide.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/JochenYang/luma-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server