Skip to main content
Glama

chat_vision

Analyze images through two-turn conversations: first answer questions about image content, then respond to follow-up questions about visual details.

Instructions

两轮对话式图像问答

支持基于图像的两轮对话:

  • 第一轮:根据图像和本地AI的询问信息进行回复

  • 第二轮:如果本地AI对图像画面细节有进一步追问,则回答


使用场景

  • 深度图像分析

  • 迭代式问题探索

  • 复杂图像理解

参数说明

  • image: 图像输入(路径或Base64)

  • question: 问题

  • session_id: 会话ID(用于第二轮对话,首次调用可不提供)

  • is_new_conversation: 是否开始新对话(设为true会创建新会话)

两轮对话流程

  1. 第一轮:调用时不传session_id,AI分析图像并回复,返回会话ID

  2. 第二轮:传入session_id继续追问图像细节,AI回答后对话结束

  3. 超过两轮将无法继续,需开始新对话

示例

# 第一轮对话
result1 = chat_vision(
    image="C:/chart.png",
    question="这个图表显示什么数据?"
)
session_id = result1["session_id"]

# 第二轮对话(追问细节,对话结束后无法继续)
if result1["remaining_turns"] > 0:
    result2 = chat_vision(
        image="C:/chart.png",
        question="数据有什么趋势?",
        session_id=session_id
    )

返回内容

  • status: 执行状态

  • answer: 回答

  • session_id: 会话ID

  • conversation_turn: 当前对话轮次(1或2)

  • remaining_turns: 剩余对话轮次

  • can_continue: 是否可以继续对话

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
imageYes图像输入:本地文件路径或Base64编码
questionYes关于图像的问题
session_idNo会话ID(多轮对话用)
is_new_conversationNo是否开始新对话

Implementation Reference

  • Main chat_vision tool handler - Implements a two-turn conversational image Q&A tool that manages sessions, processes images, and calls the vision API with conversation history.
    @mcp.tool()
    async def chat_vision(
        image: str = Field(description="图像输入:本地文件路径或Base64编码"),
        question: str = Field(description="关于图像的问题"),
        session_id: str | None = Field(default=None, description="会话ID(多轮对话用)"),
        is_new_conversation: bool = Field(default=False, description="是否开始新对话"),
    ) -> dict[str, Any]:
        """
        两轮对话式图像问答
    
        支持基于图像的两轮对话:
        - 第一轮:根据图像和本地AI的询问信息进行回复
        - 第二轮:如果本地AI对图像画面细节有进一步追问,则回答
    
        ---
        **使用场景**:
        - 深度图像分析
        - 迭代式问题探索
        - 复杂图像理解
    
        **参数说明**:
        - `image`: 图像输入(路径或Base64)
        - `question`: 问题
        - `session_id`: 会话ID(用于第二轮对话,首次调用可不提供)
        - `is_new_conversation`: 是否开始新对话(设为true会创建新会话)
    
        **两轮对话流程**:
        1. 第一轮:调用时不传session_id,AI分析图像并回复,返回会话ID
        2. 第二轮:传入session_id继续追问图像细节,AI回答后对话结束
        3. 超过两轮将无法继续,需开始新对话
    
        **示例**:
        ```python
        # 第一轮对话
        result1 = chat_vision(
            image="C:/chart.png",
            question="这个图表显示什么数据?"
        )
        session_id = result1["session_id"]
    
        # 第二轮对话(追问细节,对话结束后无法继续)
        if result1["remaining_turns"] > 0:
            result2 = chat_vision(
                image="C:/chart.png",
                question="数据有什么趋势?",
                session_id=session_id
            )
        ```
    
        **返回内容**:
        - `status`: 执行状态
        - `answer`: 回答
        - `session_id`: 会话ID
        - `conversation_turn`: 当前对话轮次(1或2)
        - `remaining_turns`: 剩余对话轮次
        - `can_continue`: 是否可以继续对话
        """
        logger.info(f"收到chat_vision请求,问题: {question[:50]}...")
    
        try:
            # 获取管理器、处理器和客户端
            manager = get_chat_manager()
            processor = get_image_processor()
            client = get_vision_client()
    
            # 处理会话
            if is_new_conversation or session_id is None:
                session = manager.create_new_session()
                logger.info(f"创建新会话: {session.session_id[:8]}")
            else:
                session = manager.get_or_create_session(session_id)
    
                # 检查是否可以继续对话
                if not session.can_continue():
                    logger.info(f"会话 {session.session_id[:8]} 已达到最大轮次限制")
                    return {
                        "status": "completed",
                        "message": "该会话已完成两轮对话,已结束。如需继续分析图像,请上传新的图片并设置 is_new_conversation=true 开始新会话。",
                        "hint": "下次调用时需要提供新的 image 参数和 is_new_conversation=true",
                        "session_id": session.session_id,
                        "conversation_turn": session.current_turn,
                        "remaining_turns": 0,
                        "can_continue": False,
                    }
    
            # 处理图像输入
            image_info = processor.process_image_input(image)
    
            # 设置图像上下文
            session.set_image_context(image_info["url"], image_info)
    
            # 添加用户问题到历史
            session.add_message("user", question)
    
            # 获取对话历史
            history = session.get_openai_history()
    
            # 调用视觉API(多轮对话模式)
            answer = await client.chat_with_image(
                image_url=image_info["url"],
                question=question,
                conversation_history=history[:-1] if len(history) > 1 else None,  # 排除刚添加的问题
            )
    
            # 添加助手回答到历史(会自动增加轮次)
            session.add_message("assistant", answer)
    
            # 保存会话
            manager.save_session(session)
    
            logger.info(f"chat_vision完成,会话: {session.session_id[:8]},轮次: {session.current_turn}")
    
            return {
                "status": "success",
                "answer": answer,
                "session_id": session.session_id,
                "conversation_turn": session.current_turn,
                "remaining_turns": session.get_remaining_turns(),
                "can_continue": session.can_continue(),
                "image_info": {
                    "source_type": image_info["source_type"],
                    "mime_type": image_info["mime_type"],
                    "size": image_info["size"],
                }
            }
    
        except FileNotFoundError as e:
            logger.error(f"文件未找到: {e}")
            return {
                "status": "error",
                "error": f"文件未找到: {str(e)}",
                "error_type": "file_not_found",
            }
    
        except ValueError as e:
            logger.error(f"参数错误: {e}")
            return {
                "status": "error",
                "error": str(e),
                "error_type": "invalid_input",
            }
    
        except Exception as e:
            logger.error(f"对话失败: {e}")
            return {
                "status": "error",
                "error": f"对话失败: {str(e)}",
                "error_type": "chat_failed",
            }
  • Tool registration with @mcp.tool() decorator and schema definition using Pydantic Fields for parameter validation.
    @mcp.tool()
    async def chat_vision(
        image: str = Field(description="图像输入:本地文件路径或Base64编码"),
        question: str = Field(description="关于图像的问题"),
        session_id: str | None = Field(default=None, description="会话ID(多轮对话用)"),
        is_new_conversation: bool = Field(default=False, description="是否开始新对话"),
    ) -> dict[str, Any]:
  • Parameter schema definition using Pydantic Fields - defines input validation for image, question, session_id, and is_new_conversation parameters.
    @mcp.tool()
    async def chat_vision(
        image: str = Field(description="图像输入:本地文件路径或Base64编码"),
        question: str = Field(description="关于图像的问题"),
        session_id: str | None = Field(default=None, description="会话ID(多轮对话用)"),
        is_new_conversation: bool = Field(default=False, description="是否开始新对话"),
    ) -> dict[str, Any]:
  • ChatManager class - Manages conversation sessions with support for creating, retrieving, and saving sessions. Enforces 2-turn conversation limit.
    class ChatManager:
        """对话管理器 - 管理多个会话"""
    
        def __init__(self):
            """初始化对话管理器"""
            self._sessions: dict[str, ChatSession] = {}
            self._persistence_enabled = False
            self._history_file: Path | None = None
    
            # 初始化持久化
            self._init_persistence()
    
        def _init_persistence(self):
            """初始化持久化存储"""
            server_config = get_server_config()
    
            if server_config.enable_persistence:
                self._persistence_enabled = True
                self._history_file = Path(server_config.history_path).expanduser()
    
                # 确保目录存在
                self._history_file.parent.mkdir(parents=True, exist_ok=True)
    
                # 加载已有会话
                self._load_from_file()
    
                logger.info(f"持久化已启用,历史文件: {self._history_file}")
            else:
                logger.info("持久化未启用,使用内存模式")
    
        def _load_from_file(self):
            """从文件加载会话历史"""
            if not self._history_file or not self._history_file.exists():
                logger.info("历史文件不存在,将创建新文件")
                return
    
            try:
                content = self._history_file.read_text(encoding="utf-8")
                data = json.loads(content)
    
                if not isinstance(data, dict):
                    logger.warning("历史文件格式错误,忽略")
                    return
    
                # 加载所有会话
                for session_id, session_data in data.items():
                    if isinstance(session_data, dict):
                        self._sessions[session_id] = ChatSession.from_dict(session_data)
    
                logger.info(f"从文件加载了 {len(self._sessions)} 个会话")
    
            except json.JSONDecodeError as e:
                logger.error(f"历史文件JSON解析失败: {e}")
            except Exception as e:
                logger.error(f"加载历史文件失败: {e}")
    
        def _save_to_file(self):
            """保存会话历史到文件"""
            if not self._persistence_enabled or not self._history_file:
                return
    
            try:
                # 确保目录存在
                self._history_file.parent.mkdir(parents=True, exist_ok=True)
    
                # 转换所有会话为字典
                data = {
                    session_id: session.to_dict()
                    for session_id, session in self._sessions.items()
                }
    
                # 保存为格式化的JSON
                self._history_file.write_text(
                    json.dumps(data, ensure_ascii=False, indent=2),
                    encoding="utf-8"
                )
    
                logger.debug(f"已保存 {len(self._sessions)} 个会话到文件")
    
            except Exception as e:
                logger.warning(f"保存历史文件失败: {e}")
    
        def get_or_create_session(self, session_id: str | None = None) -> ChatSession:
            """
            获取或创建会话
    
            Args:
                session_id: 会话ID(可选)
    
            Returns:
                ChatSession: 会话实例
            """
            if session_id and session_id in self._sessions:
                logger.debug(f"使用现有会话: {session_id[:8]}")
                return self._sessions[session_id]
    
            # 创建新会话
            session = ChatSession(session_id)
            self._sessions[session.session_id] = session
    
            logger.info(f"创建新会话: {session.session_id[:8]}")
    
            # 保存到文件
            self._save_to_file()
    
            return session
    
        def create_new_session(self) -> ChatSession:
            """
            创建新会话
    
            Returns:
                ChatSession: 新会话实例
            """
            session = ChatSession()
            self._sessions[session.session_id] = session
    
            logger.info(f"创建新会话: {session.session_id[:8]}")
    
            # 保存到文件
            self._save_to_file()
    
            return session
  • chat_with_image method - Handles multi-turn conversational image Q&A by calling OpenAI-compatible vision API with conversation history.
    async def chat_with_image(
        self,
        image_url: str,
        question: str,
        conversation_history: list[dict[str, Any]] | None = None,
        system_prompt: str | None = None,
    ) -> str:
        """
        多轮对话式图像问答
    
        Args:
            image_url: 图像URL
            question: 问题
            conversation_history: 对话历史
            system_prompt: 自定义系统提示词
    
        Returns:
            str: 回答
        """
        system_prompt = system_prompt or self.SYSTEM_PROMPT
    
        # 构建消息列表
        messages = [{"role": "system", "content": system_prompt}]
    
        # 添加对话历史
        if conversation_history:
            for turn in conversation_history:
                role = turn.get("role", "user")
                content = turn.get("content", "")
                if role in ("user", "assistant"):
                    messages.append({"role": role, "content": content})
    
        # 添加当前问题(包含图像)
        messages.append({
            "role": "user",
            "content": [
                {"type": "text", "text": question},
                {
                    "type": "image_url",
                    "image_url": {"url": image_url}
                }
            ]
        })
    
        logger.info(f"发送多轮对话请求,历史轮数: {len(conversation_history or [])}")
    
        try:
            response = self._client.chat.completions.create(
                model=self.config.model,
                messages=messages,
                temperature=self.config.temperature,
                max_tokens=self.config.max_tokens,
            )
    
            content = response.choices[0].message.content
            logger.info(f"收到多轮对话响应,长度: {len(content)}")
            return content
    
        except Exception as e:
            logger.error(f"多轮对话请求失败: {e}")
            raise
Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/LZMW/mcp-vision-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server