MCP Vision Server

Overview Schema Related Servers Score Discussions

analyze_image

Analyze image content to extract text via OCR, describe visual elements, recognize code from screenshots, and interpret data charts. Provide image file paths or Base64 inputs with specific analysis prompts.

Instructions

分析图像内容

这是核心工具，用于分析图像并返回详细描述。

使用场景：

图像内容识别与描述
文字提取（OCR）
代码截图识别
数据图表分析
技术图表理解

参数说明：

image: 支持本地文件路径（如 C:/path/to/image.png）或Base64编码
prompt: 分析指令，告诉AI你想了解图像的什么内容

示例：

# 基础图像描述
analyze_image(image="C:/screenshots/desktop.png", prompt="描述这张截图的内容")

# OCR文字提取
analyze_image(image="C:/docs/scan.png", prompt="提取图片中的所有文字")

# 代码识别
analyze_image(image="C:/code/snippet.png", prompt="识别并转录图片中的代码")

返回内容：

status: 执行状态（"success"或"error"）
result: 分析结果
image_info: 图像信息（类型、大小等）

Input Schema

TableJSON Schema

Name	Required	Description	Default
`image`	Yes	图像输入：本地文件路径或Base64编码
`prompt`	No	分析指令	详细描述这张图片

Output Schema

TableJSON Schema

Name	Required	Description	Default
No arguments

Implementation Reference

src/mcp_vision/server.py:29-118 (handler)

Main MCP tool handler for analyze_image. Decorated with @mcp.tool() for registration. Takes image (path or Base64) and prompt parameters, processes the image input, calls the vision API client, and returns structured results with status, analysis result, and image metadata. Includes error handling for file not found, invalid input, and analysis failures.

@mcp.tool()
async def analyze_image(
    image: str = Field(description="图像输入：本地文件路径或Base64编码"),
    prompt: str = Field(default="详细描述这张图片", description="分析指令"),
) -> dict[str, Any]:
    """
    分析图像内容

    这是核心工具，用于分析图像并返回详细描述。

    ---
    **使用场景**：
    - 图像内容识别与描述
    - 文字提取（OCR）
    - 代码截图识别
    - 数据图表分析
    - 技术图表理解

    **参数说明**：
    - `image`: 支持本地文件路径（如 `C:/path/to/image.png`）或Base64编码
    - `prompt`: 分析指令，告诉AI你想了解图像的什么内容

    **示例**：
    ```python
    # 基础图像描述
    analyze_image(image="C:/screenshots/desktop.png", prompt="描述这张截图的内容")

    # OCR文字提取
    analyze_image(image="C:/docs/scan.png", prompt="提取图片中的所有文字")

    # 代码识别
    analyze_image(image="C:/code/snippet.png", prompt="识别并转录图片中的代码")
    ```

    **返回内容**：
    - `status`: 执行状态（"success"或"error"）
    - `result`: 分析结果
    - `image_info`: 图像信息（类型、大小等）
    """
    logger.info(f"收到analyze_image请求，提示词: {prompt[:50]}...")

    try:
        # 获取处理器和客户端
        processor = get_image_processor()
        client = get_vision_client()

        # 处理图像输入
        image_info = processor.process_image_input(image)

        # 调用视觉API
        result = await client.analyze_image(
            image_url=image_info["url"],
            prompt=prompt,
        )

        logger.info(f"analyze_image完成，结果长度: {len(result)}")

        return {
            "status": "success",
            "result": result,
            "image_info": {
                "source_type": image_info["source_type"],
                "mime_type": image_info["mime_type"],
                "size": image_info["size"],
            }
        }

    except FileNotFoundError as e:
        logger.error(f"文件未找到: {e}")
        return {
            "status": "error",
            "error": f"文件未找到: {str(e)}",
            "error_type": "file_not_found",
        }

    except ValueError as e:
        logger.error(f"参数错误: {e}")
        return {
            "status": "error",
            "error": str(e),
            "error_type": "invalid_input",
        }

    except Exception as e:
        logger.error(f"分析失败: {e}")
        return {
            "status": "error",
            "error": f"分析失败: {str(e)}",
            "error_type": "analysis_failed",
        }

src/mcp_vision/server.py:30-33 (schema)
Schema definition for analyze_image tool parameters using Pydantic Field. Defines 'image' as string (file path or Base64) and 'prompt' as string with default value for the analysis instruction.
```
async def analyze_image(
    image: str = Field(description="图像输入：本地文件路径或Base64编码"),
    prompt: str = Field(default="详细描述这张图片", description="分析指令"),
) -> dict[str, Any]:
```

src/mcp_vision/vision_client.py:72-122 (helper)

VisionClient.analyze_image() - The actual API client method that calls the OpenAI-compatible vision API. Builds the message payload with system prompt, user prompt, and image URL, then sends the request to the configured model and returns the analysis result.

async def analyze_image(
    self,
    image_url: str,
    prompt: str,
    system_prompt: str | None = None,
) -> str:
    """
    分析图像

    Args:
        image_url: 图像URL（data:格式或http(s)://格式）
        prompt: 分析指令
        system_prompt: 自定义系统提示词（可选）

    Returns:
        str: 分析结果
    """
    system_prompt = system_prompt or self.SYSTEM_PROMPT

    # 构建消息内容（OpenAI Vision格式）
    messages = [
        {"role": "system", "content": system_prompt},
        {
            "role": "user",
            "content": [
                {"type": "text", "text": prompt},
                {
                    "type": "image_url",
                    "image_url": {"url": image_url}
                }
            ]
        }
    ]

    logger.info(f"发送视觉分析请求，提示词长度: {len(prompt)}")

    try:
        response = self._client.chat.completions.create(
            model=self.config.model,
            messages=messages,
            temperature=self.config.temperature,
            max_tokens=self.config.max_tokens,
        )

        content = response.choices[0].message.content
        logger.info(f"收到视觉分析响应，长度: {len(content)}")
        return content

    except Exception as e:
        logger.error(f"视觉分析请求失败: {e}")
        raise

src/mcp_vision/image_processor.py:30-99 (helper)

ImageProcessor.process_image_input() - Core image processing logic that handles both file paths and Base64 inputs. Validates file existence, checks size limits, determines MIME type, converts to Base64 if needed, and returns a standardized dictionary with URL, mime_type, size, and source_type.

def process_image_input(self, image_input: str) -> dict[str, Any]:
    """
    处理图像输入，自动识别路径或Base64编码

    Args:
        image_input: 图像输入，可以是本地文件路径或Base64编码

    Returns:
        dict: 包含处理结果的字典
            - url: OpenAI格式的图像URL（file://或data:）
            - mime_type: 图像MIME类型
            - size: 图像大小（字节）
            - source_type: 输入类型（'file'或'base64'）

    Raises:
        ValueError: 输入格式无效或图像过大
    """
    # 判断输入类型 - 优先检测Base64（避免与路径混淆）
    if is_base64(image_input):
        return self._process_base64_input(image_input)
    elif is_file_path(image_input):
        return self._process_file_input(image_input)
    else:
        raise ValueError(
            f"无法识别的图像输入格式。请提供有效的文件路径或Base64编码。"
        )

def _process_file_input(self, file_path: str) -> dict[str, Any]:
    """
    处理文件路径输入

    Args:
        file_path: 图像文件路径

    Returns:
        dict: 处理结果
    """
    path = Path(file_path)

    # 检查文件存在
    if not path.exists():
        raise FileNotFoundError(f"图像文件不存在: {file_path}")

    # 检查文件大小
    file_size = path.stat().st_size
    if file_size > self.max_image_size:
        max_mb = self.max_image_size / (1024 * 1024)
        actual_mb = file_size / (1024 * 1024)
        raise ValueError(
            f"图像文件过大: {actual_mb:.2f}MB，最大允许: {max_mb:.2f}MB"
        )

    # 获取MIME类型
    mime_type = get_image_mime_type(file_path)

    # 转换为Base64
    base64_data = file_to_base64(file_path)

    # 构建data URL
    data_url = base64_to_data_url(base64_data, mime_type)

    logger.info(f"[图像处理] 文件输入: {file_path}, 大小: {file_size}字节, 类型: {mime_type}")

    return {
        "url": data_url,
        "mime_type": mime_type,
        "size": file_size,
        "source_type": "file",
        "file_path": str(path.absolute()),
    }

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions the tool returns detailed descriptions and lists specific use cases, which helps understand its behavior. However, it doesn't disclose important traits like rate limits, authentication requirements, error handling, or whether it's read-only vs. destructive. The description adds some context but leaves significant behavioral gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections (purpose, usage scenarios, parameter explanation, examples, return content) and every sentence adds value. It's appropriately sized for a tool with 2 parameters and comprehensive examples, with no wasted text or redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 2 parameters with 100% schema coverage and an output schema (implied by the '返回内容' section), the description provides good contextual completeness. It covers purpose, usage scenarios, parameter semantics with examples, and return values. The main gap is lack of behavioral transparency details that would be important for a tool performing image analysis.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 100%, so the schema already documents both parameters well. The description adds value by providing concrete examples of parameter usage in different scenarios (basic description, OCR, code recognition) and clarifies that 'image' accepts both local file paths and Base64 encoding, which enhances understanding beyond the schema's basic descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose as '分析图像内容' (analyze image content) and specifies it returns detailed descriptions, which is a specific verb+resource combination. However, it doesn't explicitly distinguish this from sibling tools like 'chat_vision' or 'get_status', leaving some ambiguity about when to choose one over another.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The '使用场景' (usage scenarios) section provides clear contexts for when to use this tool: image content recognition, OCR, code screenshot recognition, data chart analysis, and technical chart understanding. This gives good guidance, but it doesn't explicitly state when NOT to use it or mention alternatives among sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/LZMW/mcp-vision-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server