chat_vision
Analyze images through two-turn conversations: first answer questions about image content, then respond to follow-up questions about visual details.
Instructions
两轮对话式图像问答
支持基于图像的两轮对话:
第一轮:根据图像和本地AI的询问信息进行回复
第二轮:如果本地AI对图像画面细节有进一步追问,则回答
使用场景:
深度图像分析
迭代式问题探索
复杂图像理解
参数说明:
image: 图像输入(路径或Base64)question: 问题session_id: 会话ID(用于第二轮对话,首次调用可不提供)is_new_conversation: 是否开始新对话(设为true会创建新会话)
两轮对话流程:
第一轮:调用时不传session_id,AI分析图像并回复,返回会话ID
第二轮:传入session_id继续追问图像细节,AI回答后对话结束
超过两轮将无法继续,需开始新对话
示例:
# 第一轮对话
result1 = chat_vision(
image="C:/chart.png",
question="这个图表显示什么数据?"
)
session_id = result1["session_id"]
# 第二轮对话(追问细节,对话结束后无法继续)
if result1["remaining_turns"] > 0:
result2 = chat_vision(
image="C:/chart.png",
question="数据有什么趋势?",
session_id=session_id
)返回内容:
status: 执行状态answer: 回答session_id: 会话IDconversation_turn: 当前对话轮次(1或2)remaining_turns: 剩余对话轮次can_continue: 是否可以继续对话
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| image | Yes | 图像输入:本地文件路径或Base64编码 | |
| question | Yes | 关于图像的问题 | |
| session_id | No | 会话ID(多轮对话用) | |
| is_new_conversation | No | 是否开始新对话 |
Implementation Reference
- src/mcp_vision/server.py:121-269 (handler)Main chat_vision tool handler - Implements a two-turn conversational image Q&A tool that manages sessions, processes images, and calls the vision API with conversation history.
@mcp.tool() async def chat_vision( image: str = Field(description="图像输入:本地文件路径或Base64编码"), question: str = Field(description="关于图像的问题"), session_id: str | None = Field(default=None, description="会话ID(多轮对话用)"), is_new_conversation: bool = Field(default=False, description="是否开始新对话"), ) -> dict[str, Any]: """ 两轮对话式图像问答 支持基于图像的两轮对话: - 第一轮:根据图像和本地AI的询问信息进行回复 - 第二轮:如果本地AI对图像画面细节有进一步追问,则回答 --- **使用场景**: - 深度图像分析 - 迭代式问题探索 - 复杂图像理解 **参数说明**: - `image`: 图像输入(路径或Base64) - `question`: 问题 - `session_id`: 会话ID(用于第二轮对话,首次调用可不提供) - `is_new_conversation`: 是否开始新对话(设为true会创建新会话) **两轮对话流程**: 1. 第一轮:调用时不传session_id,AI分析图像并回复,返回会话ID 2. 第二轮:传入session_id继续追问图像细节,AI回答后对话结束 3. 超过两轮将无法继续,需开始新对话 **示例**: ```python # 第一轮对话 result1 = chat_vision( image="C:/chart.png", question="这个图表显示什么数据?" ) session_id = result1["session_id"] # 第二轮对话(追问细节,对话结束后无法继续) if result1["remaining_turns"] > 0: result2 = chat_vision( image="C:/chart.png", question="数据有什么趋势?", session_id=session_id ) ``` **返回内容**: - `status`: 执行状态 - `answer`: 回答 - `session_id`: 会话ID - `conversation_turn`: 当前对话轮次(1或2) - `remaining_turns`: 剩余对话轮次 - `can_continue`: 是否可以继续对话 """ logger.info(f"收到chat_vision请求,问题: {question[:50]}...") try: # 获取管理器、处理器和客户端 manager = get_chat_manager() processor = get_image_processor() client = get_vision_client() # 处理会话 if is_new_conversation or session_id is None: session = manager.create_new_session() logger.info(f"创建新会话: {session.session_id[:8]}") else: session = manager.get_or_create_session(session_id) # 检查是否可以继续对话 if not session.can_continue(): logger.info(f"会话 {session.session_id[:8]} 已达到最大轮次限制") return { "status": "completed", "message": "该会话已完成两轮对话,已结束。如需继续分析图像,请上传新的图片并设置 is_new_conversation=true 开始新会话。", "hint": "下次调用时需要提供新的 image 参数和 is_new_conversation=true", "session_id": session.session_id, "conversation_turn": session.current_turn, "remaining_turns": 0, "can_continue": False, } # 处理图像输入 image_info = processor.process_image_input(image) # 设置图像上下文 session.set_image_context(image_info["url"], image_info) # 添加用户问题到历史 session.add_message("user", question) # 获取对话历史 history = session.get_openai_history() # 调用视觉API(多轮对话模式) answer = await client.chat_with_image( image_url=image_info["url"], question=question, conversation_history=history[:-1] if len(history) > 1 else None, # 排除刚添加的问题 ) # 添加助手回答到历史(会自动增加轮次) session.add_message("assistant", answer) # 保存会话 manager.save_session(session) logger.info(f"chat_vision完成,会话: {session.session_id[:8]},轮次: {session.current_turn}") return { "status": "success", "answer": answer, "session_id": session.session_id, "conversation_turn": session.current_turn, "remaining_turns": session.get_remaining_turns(), "can_continue": session.can_continue(), "image_info": { "source_type": image_info["source_type"], "mime_type": image_info["mime_type"], "size": image_info["size"], } } except FileNotFoundError as e: logger.error(f"文件未找到: {e}") return { "status": "error", "error": f"文件未找到: {str(e)}", "error_type": "file_not_found", } except ValueError as e: logger.error(f"参数错误: {e}") return { "status": "error", "error": str(e), "error_type": "invalid_input", } except Exception as e: logger.error(f"对话失败: {e}") return { "status": "error", "error": f"对话失败: {str(e)}", "error_type": "chat_failed", } - src/mcp_vision/server.py:121-127 (registration)Tool registration with @mcp.tool() decorator and schema definition using Pydantic Fields for parameter validation.
@mcp.tool() async def chat_vision( image: str = Field(description="图像输入:本地文件路径或Base64编码"), question: str = Field(description="关于图像的问题"), session_id: str | None = Field(default=None, description="会话ID(多轮对话用)"), is_new_conversation: bool = Field(default=False, description="是否开始新对话"), ) -> dict[str, Any]: - src/mcp_vision/server.py:121-127 (schema)Parameter schema definition using Pydantic Fields - defines input validation for image, question, session_id, and is_new_conversation parameters.
@mcp.tool() async def chat_vision( image: str = Field(description="图像输入:本地文件路径或Base64编码"), question: str = Field(description="关于图像的问题"), session_id: str | None = Field(default=None, description="会话ID(多轮对话用)"), is_new_conversation: bool = Field(default=False, description="是否开始新对话"), ) -> dict[str, Any]: - ChatManager class - Manages conversation sessions with support for creating, retrieving, and saving sessions. Enforces 2-turn conversation limit.
class ChatManager: """对话管理器 - 管理多个会话""" def __init__(self): """初始化对话管理器""" self._sessions: dict[str, ChatSession] = {} self._persistence_enabled = False self._history_file: Path | None = None # 初始化持久化 self._init_persistence() def _init_persistence(self): """初始化持久化存储""" server_config = get_server_config() if server_config.enable_persistence: self._persistence_enabled = True self._history_file = Path(server_config.history_path).expanduser() # 确保目录存在 self._history_file.parent.mkdir(parents=True, exist_ok=True) # 加载已有会话 self._load_from_file() logger.info(f"持久化已启用,历史文件: {self._history_file}") else: logger.info("持久化未启用,使用内存模式") def _load_from_file(self): """从文件加载会话历史""" if not self._history_file or not self._history_file.exists(): logger.info("历史文件不存在,将创建新文件") return try: content = self._history_file.read_text(encoding="utf-8") data = json.loads(content) if not isinstance(data, dict): logger.warning("历史文件格式错误,忽略") return # 加载所有会话 for session_id, session_data in data.items(): if isinstance(session_data, dict): self._sessions[session_id] = ChatSession.from_dict(session_data) logger.info(f"从文件加载了 {len(self._sessions)} 个会话") except json.JSONDecodeError as e: logger.error(f"历史文件JSON解析失败: {e}") except Exception as e: logger.error(f"加载历史文件失败: {e}") def _save_to_file(self): """保存会话历史到文件""" if not self._persistence_enabled or not self._history_file: return try: # 确保目录存在 self._history_file.parent.mkdir(parents=True, exist_ok=True) # 转换所有会话为字典 data = { session_id: session.to_dict() for session_id, session in self._sessions.items() } # 保存为格式化的JSON self._history_file.write_text( json.dumps(data, ensure_ascii=False, indent=2), encoding="utf-8" ) logger.debug(f"已保存 {len(self._sessions)} 个会话到文件") except Exception as e: logger.warning(f"保存历史文件失败: {e}") def get_or_create_session(self, session_id: str | None = None) -> ChatSession: """ 获取或创建会话 Args: session_id: 会话ID(可选) Returns: ChatSession: 会话实例 """ if session_id and session_id in self._sessions: logger.debug(f"使用现有会话: {session_id[:8]}") return self._sessions[session_id] # 创建新会话 session = ChatSession(session_id) self._sessions[session.session_id] = session logger.info(f"创建新会话: {session.session_id[:8]}") # 保存到文件 self._save_to_file() return session def create_new_session(self) -> ChatSession: """ 创建新会话 Returns: ChatSession: 新会话实例 """ session = ChatSession() self._sessions[session.session_id] = session logger.info(f"创建新会话: {session.session_id[:8]}") # 保存到文件 self._save_to_file() return session - chat_with_image method - Handles multi-turn conversational image Q&A by calling OpenAI-compatible vision API with conversation history.
async def chat_with_image( self, image_url: str, question: str, conversation_history: list[dict[str, Any]] | None = None, system_prompt: str | None = None, ) -> str: """ 多轮对话式图像问答 Args: image_url: 图像URL question: 问题 conversation_history: 对话历史 system_prompt: 自定义系统提示词 Returns: str: 回答 """ system_prompt = system_prompt or self.SYSTEM_PROMPT # 构建消息列表 messages = [{"role": "system", "content": system_prompt}] # 添加对话历史 if conversation_history: for turn in conversation_history: role = turn.get("role", "user") content = turn.get("content", "") if role in ("user", "assistant"): messages.append({"role": role, "content": content}) # 添加当前问题(包含图像) messages.append({ "role": "user", "content": [ {"type": "text", "text": question}, { "type": "image_url", "image_url": {"url": image_url} } ] }) logger.info(f"发送多轮对话请求,历史轮数: {len(conversation_history or [])}") try: response = self._client.chat.completions.create( model=self.config.model, messages=messages, temperature=self.config.temperature, max_tokens=self.config.max_tokens, ) content = response.choices[0].message.content logger.info(f"收到多轮对话响应,长度: {len(content)}") return content except Exception as e: logger.error(f"多轮对话请求失败: {e}") raise