analyze_image
Analyze image content to extract text via OCR, describe visual elements, recognize code from screenshots, and interpret data charts. Provide image file paths or Base64 inputs with specific analysis prompts.
Instructions
分析图像内容
这是核心工具,用于分析图像并返回详细描述。
使用场景:
图像内容识别与描述
文字提取(OCR)
代码截图识别
数据图表分析
技术图表理解
参数说明:
image: 支持本地文件路径(如C:/path/to/image.png)或Base64编码prompt: 分析指令,告诉AI你想了解图像的什么内容
示例:
# 基础图像描述
analyze_image(image="C:/screenshots/desktop.png", prompt="描述这张截图的内容")
# OCR文字提取
analyze_image(image="C:/docs/scan.png", prompt="提取图片中的所有文字")
# 代码识别
analyze_image(image="C:/code/snippet.png", prompt="识别并转录图片中的代码")返回内容:
status: 执行状态("success"或"error")result: 分析结果image_info: 图像信息(类型、大小等)
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| image | Yes | 图像输入:本地文件路径或Base64编码 | |
| prompt | No | 分析指令 | 详细描述这张图片 |
Implementation Reference
- src/mcp_vision/server.py:29-118 (handler)Main MCP tool handler for analyze_image. Decorated with @mcp.tool() for registration. Takes image (path or Base64) and prompt parameters, processes the image input, calls the vision API client, and returns structured results with status, analysis result, and image metadata. Includes error handling for file not found, invalid input, and analysis failures.
@mcp.tool() async def analyze_image( image: str = Field(description="图像输入:本地文件路径或Base64编码"), prompt: str = Field(default="详细描述这张图片", description="分析指令"), ) -> dict[str, Any]: """ 分析图像内容 这是核心工具,用于分析图像并返回详细描述。 --- **使用场景**: - 图像内容识别与描述 - 文字提取(OCR) - 代码截图识别 - 数据图表分析 - 技术图表理解 **参数说明**: - `image`: 支持本地文件路径(如 `C:/path/to/image.png`)或Base64编码 - `prompt`: 分析指令,告诉AI你想了解图像的什么内容 **示例**: ```python # 基础图像描述 analyze_image(image="C:/screenshots/desktop.png", prompt="描述这张截图的内容") # OCR文字提取 analyze_image(image="C:/docs/scan.png", prompt="提取图片中的所有文字") # 代码识别 analyze_image(image="C:/code/snippet.png", prompt="识别并转录图片中的代码") ``` **返回内容**: - `status`: 执行状态("success"或"error") - `result`: 分析结果 - `image_info`: 图像信息(类型、大小等) """ logger.info(f"收到analyze_image请求,提示词: {prompt[:50]}...") try: # 获取处理器和客户端 processor = get_image_processor() client = get_vision_client() # 处理图像输入 image_info = processor.process_image_input(image) # 调用视觉API result = await client.analyze_image( image_url=image_info["url"], prompt=prompt, ) logger.info(f"analyze_image完成,结果长度: {len(result)}") return { "status": "success", "result": result, "image_info": { "source_type": image_info["source_type"], "mime_type": image_info["mime_type"], "size": image_info["size"], } } except FileNotFoundError as e: logger.error(f"文件未找到: {e}") return { "status": "error", "error": f"文件未找到: {str(e)}", "error_type": "file_not_found", } except ValueError as e: logger.error(f"参数错误: {e}") return { "status": "error", "error": str(e), "error_type": "invalid_input", } except Exception as e: logger.error(f"分析失败: {e}") return { "status": "error", "error": f"分析失败: {str(e)}", "error_type": "analysis_failed", } - src/mcp_vision/server.py:30-33 (schema)Schema definition for analyze_image tool parameters using Pydantic Field. Defines 'image' as string (file path or Base64) and 'prompt' as string with default value for the analysis instruction.
async def analyze_image( image: str = Field(description="图像输入:本地文件路径或Base64编码"), prompt: str = Field(default="详细描述这张图片", description="分析指令"), ) -> dict[str, Any]: - VisionClient.analyze_image() - The actual API client method that calls the OpenAI-compatible vision API. Builds the message payload with system prompt, user prompt, and image URL, then sends the request to the configured model and returns the analysis result.
async def analyze_image( self, image_url: str, prompt: str, system_prompt: str | None = None, ) -> str: """ 分析图像 Args: image_url: 图像URL(data:格式或http(s)://格式) prompt: 分析指令 system_prompt: 自定义系统提示词(可选) Returns: str: 分析结果 """ system_prompt = system_prompt or self.SYSTEM_PROMPT # 构建消息内容(OpenAI Vision格式) messages = [ {"role": "system", "content": system_prompt}, { "role": "user", "content": [ {"type": "text", "text": prompt}, { "type": "image_url", "image_url": {"url": image_url} } ] } ] logger.info(f"发送视觉分析请求,提示词长度: {len(prompt)}") try: response = self._client.chat.completions.create( model=self.config.model, messages=messages, temperature=self.config.temperature, max_tokens=self.config.max_tokens, ) content = response.choices[0].message.content logger.info(f"收到视觉分析响应,长度: {len(content)}") return content except Exception as e: logger.error(f"视觉分析请求失败: {e}") raise - ImageProcessor.process_image_input() - Core image processing logic that handles both file paths and Base64 inputs. Validates file existence, checks size limits, determines MIME type, converts to Base64 if needed, and returns a standardized dictionary with URL, mime_type, size, and source_type.
def process_image_input(self, image_input: str) -> dict[str, Any]: """ 处理图像输入,自动识别路径或Base64编码 Args: image_input: 图像输入,可以是本地文件路径或Base64编码 Returns: dict: 包含处理结果的字典 - url: OpenAI格式的图像URL(file://或data:) - mime_type: 图像MIME类型 - size: 图像大小(字节) - source_type: 输入类型('file'或'base64') Raises: ValueError: 输入格式无效或图像过大 """ # 判断输入类型 - 优先检测Base64(避免与路径混淆) if is_base64(image_input): return self._process_base64_input(image_input) elif is_file_path(image_input): return self._process_file_input(image_input) else: raise ValueError( f"无法识别的图像输入格式。请提供有效的文件路径或Base64编码。" ) def _process_file_input(self, file_path: str) -> dict[str, Any]: """ 处理文件路径输入 Args: file_path: 图像文件路径 Returns: dict: 处理结果 """ path = Path(file_path) # 检查文件存在 if not path.exists(): raise FileNotFoundError(f"图像文件不存在: {file_path}") # 检查文件大小 file_size = path.stat().st_size if file_size > self.max_image_size: max_mb = self.max_image_size / (1024 * 1024) actual_mb = file_size / (1024 * 1024) raise ValueError( f"图像文件过大: {actual_mb:.2f}MB,最大允许: {max_mb:.2f}MB" ) # 获取MIME类型 mime_type = get_image_mime_type(file_path) # 转换为Base64 base64_data = file_to_base64(file_path) # 构建data URL data_url = base64_to_data_url(base64_data, mime_type) logger.info(f"[图像处理] 文件输入: {file_path}, 大小: {file_size}字节, 类型: {mime_type}") return { "url": data_url, "mime_type": mime_type, "size": file_size, "source_type": "file", "file_path": str(path.absolute()), }