MedVision MCP

Overview Schema Related Servers Score Discussions

medvision-mcp
docs

spec.md•163 KiB

# MedVision MCP v2 架構規格書 > 版本: 0.6.1 > 日期: 2026-02-02 > 狀態: Draft ## 變更記錄 | 版本 | 日期 | 變更 | |:-----|:-----|:-----| | 0.6.1 | 2026-02-02 | 修正 Section 2.3 編號衝突；ROADMAP 改寫為 Visual RAG Mode B 導向；模型下載量加註抓取日期；更新已決定事項 | | 0.6.0 | 2026-02-02 | 新增 Visual RAG 混合模式 (Mode B)：RAD-DINO + FAISS + DenseNet，`search_similar_cases`, `analyze_with_rag` 等工具 | | 0.5.0 | 2026-02-02 | 新增互動診斷流程設計、Canvas 標記類型定義、A2A vs 純 MCP 雙模式、已驗證模型狀態表 | | 0.4.0 | 2026-02-02 | 架構重構：MCP Server + Multi-Model Tools + 內建 Medical Agent (A2A)；新增 Canvas 繪畫工作區規格 | | 0.3.1 | 2026-01-28 | 新增完整 AI 模型清單：Radiology VLM、Medical Encoders、CT Segmentation、Pathology Foundation Models | | 0.3.0 | 2026-01-28 | 確認 Medical-SAM3、vLLM lazy loading 策略、更新模型來源 | | 0.2.0 | 2026-01-28 | 技術決策：SQLite、vLLM/Ollama、SAM3、React Canvas | | 0.1.0 | 2026-01-28 | 初版 | --- ## 1. 概述 ### 1.1 背景 MedVision MCP v1 是一個基於 LangGraph ReAct 架構的醫療影像分析 Agent，整合了多個 AI 模型作為工具。隨著 AI Agent 生態演進（MCP Protocol、A2A、長思考 Agent），需要重新設計架構以提升互操作性和用戶體驗。 ### 1.2 核心架構理念 ``` ┌─────────────────────────────────────────────────────────────────┐ │ MedVision MCP = MCP Server + Agent │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ ┌───────────────────┐ ┌───────────────────────────────┐ │ │ │ MCP Tools │ │ MedVision MCP Agent (Optional) │ │ │ │ ───────────── │ │ ───────────────────────── │ │ │ │ • VQA │◄────│ • 使用 MCP Tools │ │ │ │ • Segmentation │ │ • 理解醫療影像語義 │ │ │ │ • Classification│ │ • 自動分析流程編排 │ │ │ │ • Report Gen │ │ • 與 User 對話互動 │ │ │ │ • Grounding │ │ • A2A 協作能力 │ │ │ │ • ... │ │ │ │ │ └───────────────────┘ └───────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘ │ │ │ MCP Protocol │ MCP Protocol ▼ ▼ ┌─────────────────────┐ ┌─────────────────────────┐ │ External Agents │ │ Canvas UI (User) │ │ (Claude, GPT, etc) │ │ 繪畫工作區互動 │ └─────────────────────┘ └─────────────────────────┘ ``` **設計哲學**： - MCP Server 封裝多種 AI 模型為標準化 Tools - 內建 Medical Agent 使用這些 Tools（A2A-like 架構） - 外部 Agent（如 Claude）也可直接調用 MCP Tools - Canvas UI 透過 MCP Protocol 與 Agent/Tools 互動 ### 1.3 設計目標 | 目標 | 說明 | |:-----|:-----| | **MCP Server** | 所有 AI 模型封裝為 MCP Tools，提供標準化介面 | | **Multi-Model Tools** | 整合 VQA、分割、分類、報告生成等多種模型 | | **內建 Agent (A2A)** | 提供 Medical Agent 使用這些 Tools，支援自動化分析流程 | | **Canvas 繪畫工作區** | 用戶透過畫板與 Agent 互動，選定區域進行分析 | | **雙向 MCP 通訊** | UI ↔ Agent ↔ Tools 均透過 MCP Protocol 運作 | | **Session 導向** | 支持多影像、多輪互動的完整分析會話 | | **多模態支持** | CXR、KUB、EKG、CT、MRI、DICOM | ### 1.4 非目標 - 不提供完整的 PACS 系統功能 - 不處理 PHI/HIPAA 合規（由部署方負責） --- ## 2. 架構 ### 2.1 整體架構 ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ External Agent Layer │ │ (Claude / GPT / Custom Agent / CLI) │ └───────────────────────────────────┬─────────────────────────────────────────┘ │ MCP Protocol ▼ ┌─────────────────────────────────────────────────────────────────────────────┐ │ MedVision MCP MCP Server │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────────────────────────────────────────────────────────────┐ │ │ │ MedVision MCP Medical Agent (Optional) │ │ │ │ ───────────────────────────────────────────────────────────────── │ │ │ │ • ReAct / Chain-of-Thought 推理 │ │ │ │ • 醫療影像語義理解 │ │ │ │ • 自動編排分析流程 (VQA → Segment → Report) │ │ │ │ • 與 Canvas UI 互動 │ │ │ │ • A2A (Agent-to-Agent) 協作介面 │ │ │ └────────────────────────────────┬────────────────────────────────────┘ │ │ │ Internal MCP Calls │ │ ▼ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │ Session │ │ Analysis │ │ Interactive │ │ Canvas │ │ │ │ Management │ │ Tools │ │ Tools │ │ Tools │ │ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ │ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │ Export │ │ Query │ │ Agent │ │ │ │ Tools │ │ Tools │ │ Tools │ │ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │ │ │ ┌─────────────────────────────────────────────────────────────────────┐ │ │ │ AI Model Registry │ │ │ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │ │ │ │CheXagent│ │LLaVA-Med│ │ MAIRA-2 │ │Med-SAM3 │ │BiomedCLIP│ │ │ │ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │ │ │ │ (vLLM / Ollama / PyTorch 後端統一管理) │ │ │ └─────────────────────────────────────────────────────────────────────┘ │ │ │ ├─────────────────────────────────────────────────────────────────────────────┤ │ Session Store (SQLite) │ │ (Images, Analyses, Annotations, Interactions, Canvas State) │ └───────────────────────────────────┬─────────────────────────────────────────┘ │ MCP Resources / WebSocket ▼ ┌─────────────────────────────────────────────────────────────────────────────┐ │ Canvas 繪畫工作區 (Human UI) │ │ ┌─────────────────────────────────────────────────────────────────────┐ │ │ │ Image Viewer + Annotation Canvas │ │ │ │ ───────────────────────────────────────────────────────────────── │ │ │ │ • 影像顯示與縮放 │ │ │ │ • 繪圖工具 (BBox, Polygon, Point, Freehand) │ │ │ │ • AI 分割結果疊加 │ │ │ │ • 與 Agent 對話區 │ │ │ │ • 透過 MCP Protocol 傳送用戶選定區域給 Agent │ │ │ └─────────────────────────────────────────────────────────────────────┘ │ │ (React + Fabric.js / VS Code Extension / Web) │ └─────────────────────────────────────────────────────────────────────────────┘ ``` ### 2.2 組件說明 | 組件 | 職責 | 技術選型 | |:-----|:-----|:---------| | **MCP Server** | 暴露所有工具為 MCP Protocol | FastMCP / mcp-python | | **MedVision MCP Agent** | 內建醫療影像分析 Agent，使用 MCP Tools | LangGraph / ReAct | | **Multi-Model Tools** | AI 模型封裝為獨立 MCP Tools | VQA, Segment, Classify 等 | | **Session Store** | 管理分析會話狀態 | **SQLite** (持久化) | | **Model Registry** | 管理 AI 模型實例 | **vLLM + Ollama** | | **Canvas UI** | 繪畫工作區，用戶與 Agent 互動 | **React + Fabric.js** | | **Event Bus** | 工具執行結果推送到 UI | WebSocket / MCP Resources | ### 2.2.1 使用模式 ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ 三種使用模式 │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ 模式 1: 外部 Agent 直接使用 Tools │ │ ┌──────────────┐ MCP Protocol ┌──────────────────┐ │ │ │ Claude/GPT │ ────────────────────▶│ MCP Tools │ │ │ │ (External) │◀──────────────────── │ (VQA, Segment..) │ │ │ └──────────────┘ └──────────────────┘ │ │ │ │ 模式 2: 外部 Agent 委託 MedVision MCP Agent (A2A) │ │ ┌──────────────┐ MCP Protocol ┌──────────────────┐ Internal │ │ │ Claude/GPT │ ────────────────────▶│ MedVision MCP Agent │────────────────▶│ │ │ (External) │◀──────────────────── │ (Medical Expert) │ Tools │ │ └──────────────┘ └──────────────────┘ │ │ │ │ 模式 3: Canvas UI 直接互動 (獨立使用) │ │ ┌──────────────┐ MCP Protocol ┌──────────────────┐ Internal │ │ │ User + │ ────────────────────▶│ MedVision MCP Agent │────────────────▶│ │ │ Canvas UI │◀──────────────────── │ (Medical Expert) │ Tools │ │ └──────────────┘ └──────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────────┘ ``` ### 2.3 技術決策 #### 2.3.1 UI: React + Canvas (VS Code 整合優先) ``` 選擇理由： ├── VS Code WebView 原生支援 React ├── 可打包為 VS Code Extension (.vsix) ├── Fabric.js / Konva.js 提供完整繪圖能力 ├── 同一套 UI 可部署為 Web / VS Code Extension / Electron └── 未來可整合 GitHub Copilot Chat 架構： ┌─────────────────────────────────────────────────────────┐ │ VS Code Extension Host │ │ ├── Extension (TypeScript) │ │ │ ├── MCP Client │ │ │ └── WebView Provider │ │ └── WebView (React + Canvas) │ │ ├── Image Viewer │ │ ├── Fabric.js Canvas (標註/繪圖) │ │ └── WebSocket ↔ MCP Server │ └─────────────────────────────────────────────────────────┘ ``` #### 2.3.2 Session: SQLite ``` 選擇理由： ├── 零依賴，無需額外服務 ├── 輕量持久化，重啟不遺失 ├── 支援複雜查詢 (歷史分析、統計) └── 未來可遷移到 PostgreSQL 表結構： ├── sessions # 會話主表 ├── images # 影像記錄 ├── analyses # 分析結果 (JSON) ├── annotations # 標註記錄 └── interactions # 互動歷史 ``` #### 2.3.3 GPU 推理: vLLM + Ollama ``` 選擇理由： ├── vLLM: 高效 LLM 推理，支援 continuous batching ├── Ollama: 泛用本地模型運行，易於部署 ├── 統一推理後端，簡化模型管理 └── 支援模型量化 (4bit/8bit) Lazy Loading 策略： ├── vLLM 限制: 不支援動態模型切換，每實例綁定一個模型 ├── Ollama 優勢: 原生支援按需載入，未使用模型自動卸載 ├── 建議策略: │ ├── 小型部署: 使用 Ollama 管理所有模型 (簡單) │ ├── 高吞吐: vLLM 固定載入主要模型 + Ollama 備援 │ └── 彈性部署: Model Registry 管理多個 vLLM 進程 └── 記憶體管理: 設定 max_loaded_models 限制同時載入數模型分配： ┌─────────────────────────────────────────────────────────┐ │ vLLM (高效推理，固定載入) │ │ ├── CheXagent-2-3b (VQA) │ │ ├── LLaVA-Med (通用 VQA) │ │ ├── MAIRA-2 (Grounding) │ │ └── CheXOne (新一代 CXR) │ ├─────────────────────────────────────────────────────────┤ │ Ollama (泛用/按需載入) │ │ ├── LLaVA (視覺模型) │ │ ├── Llama 3 (文字推理) │ │ └── 其他社群模型 │ ├─────────────────────────────────────────────────────────┤ │ PyTorch 直接載入 (非 LLM) │ │ ├── DenseNet (分類) │ │ ├── PSPNet (器官分割) │ │ ├── Medical-SAM3 (互動分割) │ │ ├── Roentgen (影像生成) │ │ ├── BiomedCLIP (Zero-shot) │ │ └── RAD-DINO (Feature Extraction) │ ├─────────────────────────────────────────────────────────┤ │ timm/OpenCLIP (Pathology) │ │ ├── Prov-GigaPath (Tile+Slide Encoder) │ │ ├── UNI2-h (Tile Encoder) │ │ └── CONCH (Vision-Language) │ ├─────────────────────────────────────────────────────────┤ │ nnUNet (CT/MRI Segmentation) │ │ └── TotalSegmentator │ └─────────────────────────────────────────────────────────┘ ``` #### 2.3.4 互動分割: SAM 3 / Medical-SAM3 ``` Model Sources: ├── 主要: ChongCong/Medical-SAM3 (醫療專用微調版) │ ├── GitHub: https://github.com/AIM-Research-Lab/Medical-SAM3 │ ├── HuggingFace: ChongCong/Medical-SAM3 │ ├── 33 個醫療資料集、10 種影像模態微調 │ └── 論文: arXiv:2601.10880 ├── 備選: facebook/sam3 (通用版) │ ├── GitHub: https://github.com/facebookresearch/sam3 (7.4k stars) │ ├── HuggingFace: facebook/sam3 (848M params, 1.65M downloads) │ └── 需要 Python 3.12+, PyTorch 2.7+, CUDA 12.6+ 選擇理由： ├── Medical-SAM3 針對醫療影像優化 ├── 支援文字 prompt（開放詞彙概念分割） ├── 支援影片分割 (未來可擴展) ├── 改進的邊緣精度 └── 保留 prompt-driven 靈活性功能： ├── Text prompt: 文字描述分割 ("肺結節") ├── Point prompt: 點擊分割 ├── Box prompt: 框選分割 ├── Polygon prompt: 多邊形分割 ├── Mask refinement: 遮罩精修 └── Multi-object tracking: 多物件追蹤 ``` --- ## 3. 資料模型 ### 3.1 核心實體 ```python from dataclasses import dataclass, field from typing import Optional, List, Dict, Any, Literal from datetime import datetime import uuid @dataclass class MedicalImage: """醫療影像""" id: str = field(default_factory=lambda: str(uuid.uuid4())[:8]) path: str = "" type: Literal["CXR", "KUB", "EKG", "CT", "MRI", "DICOM", "Other"] = "Other" modality: Optional[str] = None # DICOM modality code metadata: Dict[str, Any] = field(default_factory=dict) created_at: datetime = field(default_factory=datetime.now) @dataclass class Annotation: """標註（AI 或用戶產生）""" id: str = field(default_factory=lambda: str(uuid.uuid4())[:8]) image_id: str = "" type: Literal["bbox", "polygon", "point", "freehand", "mask", "measurement", "text"] = "bbox" coordinates: Any = None # 依 type 不同格式 label: str = "" confidence: Optional[float] = None source: Literal["ai", "user", "system"] = "ai" notes: Optional[str] = None created_at: datetime = field(default_factory=datetime.now) @dataclass class ImageAnalysis: """單張影像的分析結果""" image_id: str = "" classification: Optional[Dict[str, float]] = None detections: List[Dict[str, Any]] = field(default_factory=list) segmentation: Optional[Dict[str, Any]] = None report: Optional[str] = None vqa_history: List[Dict[str, str]] = field(default_factory=list) created_at: datetime = field(default_factory=datetime.now) @dataclass class Interaction: """用戶互動記錄""" id: str = field(default_factory=lambda: str(uuid.uuid4())[:8]) type: Literal["upload", "region_select", "annotation", "query", "export"] = "query" data: Dict[str, Any] = field(default_factory=dict) timestamp: datetime = field(default_factory=datetime.now) @dataclass class StudySession: """分析會話（核心實體）""" id: str = field(default_factory=lambda: str(uuid.uuid4())[:8]) # 影像集合 images: List[MedicalImage] = field(default_factory=list) current_image_id: Optional[str] = None # 分析結果 analyses: Dict[str, ImageAnalysis] = field(default_factory=dict) # 標註 annotations: List[Annotation] = field(default_factory=list) # 互動歷史 interactions: List[Interaction] = field(default_factory=list) # 當前選定區域 current_roi: Optional[Dict[str, Any]] = None # 元數據 created_at: datetime = field(default_factory=datetime.now) updated_at: datetime = field(default_factory=datetime.now) def get_current_image(self) -> Optional[MedicalImage]: if not self.current_image_id: return self.images[0] if self.images else None return next((img for img in self.images if img.id == self.current_image_id), None) ``` ### 3.2 座標格式 ```python # BBox: [x1, y1, x2, y2] 左上角和右下角，值為 0-1 相對座標或像素座標 bbox = {"type": "bbox", "coordinates": [0.1, 0.2, 0.3, 0.4], "format": "relative"} # Polygon: [[x1,y1], [x2,y2], ...] 多邊形頂點 polygon = {"type": "polygon", "coordinates": [[100, 200], [150, 200], [150, 250], [100, 250]]} # Point: [x, y] 單點 point = {"type": "point", "coordinates": [200, 300]} # Mask: 二進制遮罩路徑或 base64 mask = {"type": "mask", "path": "temp/mask_001.png"} ``` --- ## 4. MCP Tools 規格 ### 4.1 Session 管理 #### `create_study_session` ```python @mcp.tool def create_study_session( name: Optional[str] = None, metadata: Optional[Dict[str, Any]] = None ) -> Dict: """ 創建新的影像分析會話 Returns: session_id: 會話 ID ui_url: 互動 UI URL（如果啟用） """ ``` #### `add_image_to_session` ```python @mcp.tool def add_image_to_session( session_id: str, image_path: str, image_type: Optional[Literal["CXR", "KUB", "EKG", "CT", "MRI", "DICOM", "Other"]] = None, auto_analyze: bool = True, analysis_config: Optional[Dict] = None ) -> Dict: """ 加入影像到會話 Args: session_id: 會話 ID image_path: 影像路徑（本地或 URL） image_type: 影像類型，None 時自動偵測 auto_analyze: 是否自動執行分析 analysis_config: 分析配置（見 analyze_image） Returns: image_id: 影像 ID type: 偵測到的影像類型 auto_analysis: 自動分析結果（如果啟用） """ ``` #### `get_session_status` ```python @mcp.tool def get_session_status(session_id: str) -> Dict: """ 取得會話狀態摘要 Returns: session_id: 會話 ID images: 影像列表摘要 current_image: 當前影像 ID total_annotations: 標註數量 ui_url: 互動 UI URL """ ``` #### `switch_image` ```python @mcp.tool def switch_image(session_id: str, image_id: str) -> Dict: """切換當前影像焦點""" ``` --- ### 4.2 分析工具 #### `analyze_image` ```python @mcp.tool def analyze_image( session_id: str, image_id: Optional[str] = None, # None = 當前影像 # 分析開關 classify: bool = True, detect: bool = True, segment: bool = False, generate_report: bool = True, # 分類配置 classification_threshold: float = 0.5, # 偵測配置 detection_mode: Literal["auto", "manual", "both"] = "auto", detection_phrases: Optional[List[str]] = None, # 分割配置 segment_organs: Optional[List[str]] = None, # None = 全部 # 預設 preset: Optional[Literal["quick", "full", "segment_only", "report_only"]] = None ) -> Dict: """ 分析影像 Presets: - quick: classify only - full: classify + detect + segment + report - segment_only: segment all organs - report_only: generate report only Returns: classification: 分類結果 detections: 偵測結果列表 segmentation: 分割結果 report: 報告文字 visualization_url: 整合視覺化 URL """ ``` #### `ask_about_image` ```python @mcp.tool def ask_about_image( session_id: str, question: str, image_id: Optional[str] = None, include_context: bool = True # 包含先前分析結果作為上下文 ) -> Dict: """ 對影像提問（VQA） Returns: answer: 回答文字 confidence: 信心度 """ ``` --- ### 4.3 互動工具（核心） #### `analyze_selected_region` ```python @mcp.tool def analyze_selected_region( session_id: str, region: Dict, # {"type": "bbox|polygon|point", "coordinates": [...]} question: Optional[str] = None, actions: List[Literal["describe", "segment", "measure", "compare"]] = ["describe"] ) -> Dict: """ 分析用戶選定的區域這是處理畫板互動的核心工具。當用戶在 UI 上繪製區域時， UI 會觸發此工具調用。 Args: region: 選定區域 - bbox: {"type": "bbox", "coordinates": [x1, y1, x2, y2]} - polygon: {"type": "polygon", "coordinates": [[x1,y1], ...]} - point: {"type": "point", "coordinates": [x, y]} question: 關於此區域的問題 actions: 要執行的動作 - describe: 描述區域內容 - segment: SAM 分割 - measure: 測量尺寸 - compare: 與其他區域比較 Returns: description: 區域描述 segmentation: 分割遮罩 measurements: 測量數據 comparison: 比較結果 annotation_id: 新建的標註 ID """ ``` #### `add_annotation` ```python @mcp.tool def add_annotation( session_id: str, image_id: Optional[str] = None, annotation_type: Literal["bbox", "polygon", "point", "text", "measurement"] = "bbox", coordinates: Any = None, label: str = "", notes: Optional[str] = None ) -> Dict: """ 手動加入標註 Returns: annotation_id: 標註 ID """ ``` #### `update_annotation` ```python @mcp.tool def update_annotation( session_id: str, annotation_id: str, label: Optional[str] = None, notes: Optional[str] = None, coordinates: Optional[Any] = None ) -> Dict: """更新標註""" ``` #### `delete_annotation` ```python @mcp.tool def delete_annotation(session_id: str, annotation_id: str) -> Dict: """刪除標註""" ``` --- ### 4.4 查詢工具 #### `get_annotations` ```python @mcp.tool def get_annotations( session_id: str, image_id: Optional[str] = None, source: Optional[Literal["ai", "user", "all"]] = "all" ) -> Dict: """取得標註列表""" ``` #### `get_analysis_summary` ```python @mcp.tool def get_analysis_summary( session_id: str, image_id: Optional[str] = None ) -> Dict: """取得分析結果摘要""" ``` --- ### 4.5 輸出工具 #### `export_study` ```python @mcp.tool def export_study( session_id: str, format: Literal["pdf", "dicom_sr", "json", "png_annotated", "html"] = "pdf", include_images: List[str] = [], # 空 = 全部 include_report: bool = True, include_annotations: bool = True, include_measurements: bool = True, template: Optional[str] = None # 報告模板 ) -> Dict: """ 匯出分析結果 Formats: - pdf: PDF 報告 - dicom_sr: DICOM Structured Report - json: 結構化 JSON - png_annotated: 標註後的影像 - html: 互動式 HTML Returns: export_path: 匯出檔案路徑 format: 匯出格式 """ ``` #### `get_visualization` ```python @mcp.tool def get_visualization( session_id: str, image_id: Optional[str] = None, include_annotations: bool = True, include_segmentation: bool = True, include_measurements: bool = True ) -> Dict: """ 取得整合視覺化 Returns: image_url: 視覺化影像 URL layers: 各圖層資訊 """ ``` --- ### 4.6 Agent Tools (A2A) 這些工具用於外部 Agent 委託 MedVision MCP Agent 執行任務。 #### `invoke_medical_agent` ```python @mcp.tool def invoke_medical_agent( session_id: str, task: str, context: Optional[Dict[str, Any]] = None, mode: Literal["auto", "interactive", "step_by_step"] = "auto" ) -> Dict: """ 委託 MedVision MCP Medical Agent 執行醫療影像分析任務這是 A2A (Agent-to-Agent) 的核心介面。外部 Agent (如 Claude) 可以透過此工具委託 MedVision MCP Agent 執行複雜的醫療影像分析。 Args: session_id: 會話 ID task: 任務描述 (自然語言) 例如: "分析這張 CXR，找出所有異常" "比較這兩張影像的變化" "用戶在右肺標記了一個區域，請詳細分析" context: 額外上下文 - user_selection: Canvas 用戶選擇的區域 - previous_findings: 先前分析結果 - clinical_history: 臨床病史 mode: 執行模式 - auto: 自動執行完整流程 - interactive: 需要時詢問用戶 - step_by_step: 逐步執行，每步回報 Returns: status: "completed" | "need_input" | "in_progress" result: 分析結果 actions_taken: 執行的工具列表 follow_up_suggestions: 建議的後續動作 """ ``` #### `get_agent_capabilities` ```python @mcp.tool def get_agent_capabilities() -> Dict: """ 取得 MedVision MCP Agent 的能力清單 Returns: capabilities: Agent 可執行的任務類型 supported_modalities: 支援的影像類型 available_tools: 可用的 MCP Tools agent_model: 底層 LLM 配置 """ ``` --- ### 4.7 Canvas Tools 這些工具專門處理 Canvas 繪畫工作區的互動。 #### `sync_canvas_state` ```python @mcp.tool def sync_canvas_state( session_id: str, canvas_state: Dict[str, Any] ) -> Dict: """ 同步 Canvas 狀態到 Server Args: canvas_state: Canvas 狀態 - viewport: 當前視窗位置/縮放 - active_tool: 當前工具 - user_drawings: 用戶繪製的圖形 - pending_selection: 待處理的選擇區域 """ ``` #### `push_to_canvas` ```python @mcp.tool def push_to_canvas( session_id: str, action: Literal["add_layer", "update_layer", "remove_layer", "highlight", "animate"], payload: Dict[str, Any] ) -> Dict: """ 推送視覺化到 Canvas 這允許 Agent 主動在 Canvas 上顯示分析結果。 Args: action: 動作類型 payload: 動作資料 - add_layer: {"type": "segmentation|bbox|annotation", "data": ...} - highlight: {"region": {...}, "style": "pulse|glow|outline"} - animate: {"from": {...}, "to": {...}, "duration": 500} """ ``` #### `request_user_input` ```python @mcp.tool def request_user_input( session_id: str, input_type: Literal["select_region", "confirm", "choose", "draw"], prompt: str, options: Optional[List[Dict]] = None ) -> Dict: """ 透過 Canvas 請求用戶輸入 Args: input_type: 輸入類型 - select_region: 請用戶框選區域 - confirm: 是/否確認 - choose: 多選一 - draw: 自由繪製 prompt: 顯示給用戶的提示 options: 選項 (用於 choose) Returns: user_input: 用戶的輸入結果 """ ``` --- ### 4.8 Visual RAG Tools (混合模式 B) Visual RAG 採用**混合模式**：DenseNet 快速分類 + RAG 參考檢索，讓外部 Agent 綜合判斷。 #### 架構概述 ``` ┌──────────────────────────────────────────────────────────────────────────┐ │ Visual RAG 混合模式 (Mode B) │ ├──────────────────────────────────────────────────────────────────────────┤ │ │ │ 輸入影像 │ │ │ │ │ ├───────────────────────┬───────────────────────┐ │ │ ▼ ▼ ▼ │ │ ┌──────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ DenseNet │ │ RAD-DINO │ │ PSPNet │ │ │ │ 快速分類 │ │ Embedding │ │ 器官分割 │ │ │ │ (18類) │ │ (768-dim) │ │ (14器官) │ │ │ └────┬─────┘ └──────┬───────┘ └──────┬───────┘ │ │ │ │ │ │ │ │ ┌──────▼───────┐ │ │ │ │ │ FAISS │ │ │ │ │ │ 向量檢索 │ │ │ │ │ └──────┬───────┘ │ │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ ┌──────────────────────────────────────────────────────────────────┐ │ │ │ 聚合結果返回給 Agent │ │ │ │ { │ │ │ │ "quick_classification": {...}, // DenseNet 結果 │ │ │ │ "similar_cases": [...], // RAG 相似案例+報告 │ │ │ │ "segmentation": {...}, // PSPNet 器官分割 │ │ │ │ "aggregated_labels": [...] // 加權投票標籤 │ │ │ │ } │ │ │ └──────────────────────────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌──────────────────────────────────────────────────────────────────┐ │ │ │ 外部 Agent (Claude/GPT) 綜合生成報告 │ │ │ └──────────────────────────────────────────────────────────────────┘ │ │ │ └──────────────────────────────────────────────────────────────────────────┘ ``` #### `search_similar_cases` ```python @mcp.tool def search_similar_cases( session_id: str, image_id: Optional[str] = None, top_k: int = 5, include_reports: bool = True, include_embeddings: bool = False ) -> Dict: """ 使用 Visual RAG 搜尋相似歷史案例。技術棧: - RAD-DINO: 影像編碼 (microsoft/rad-dino, 350MB) - FAISS: 向量檢索 (CPU/GPU) - Reference DB: MIMIC-CXR embeddings + reports Args: session_id: Session ID image_id: 影像 ID (None = 當前影像) top_k: 返回相似案例數量 include_reports: 是否包含參考報告全文 include_embeddings: 是否返回 embedding 向量 Returns: similar_cases: 相似案例列表 - case_id: 案例 ID - similarity: 相似度 (0-1) - report: 報告文字 (如 include_reports=True) - findings: 主要發現 - labels: 標籤列表 - patient_info: 去識別化患者資訊 aggregated_labels: 加權投票後的建議標籤 - label: 標籤名稱 - confidence: 信心度 (基於相似度加權) - supporting_cases: 支持此標籤的案例數 search_metadata: - index_size: 向量庫大小 - search_time_ms: 檢索耗時 """ ``` #### `analyze_with_rag` ```python @mcp.tool def analyze_with_rag( session_id: str, image_id: Optional[str] = None, mode: Literal["quick", "full", "rag_only"] = "full", top_k: int = 5 ) -> Dict: """ 混合模式分析：DenseNet 分類 + RAG 檢索。 Modes: - quick: 只執行 DenseNet 分類 (最快) - full: DenseNet + RAG + PSPNet (完整) - rag_only: 只執行 RAG 檢索 (無模型限制) Returns: quick_classification: DenseNet 分類結果 (18 類) - predictions: [{"label": str, "probability": float}, ...] - top_finding: 最高機率發現 similar_cases: RAG 相似案例 (同 search_similar_cases) aggregated_labels: 綜合建議標籤 segmentation: 器官分割結果 (如 mode=full) suggested_prompt: 建議給 LLM 的報告生成 prompt """ ``` #### `build_rag_index` ```python @mcp.tool def build_rag_index( source: Literal["mimic-cxr", "chexpert", "custom"], data_path: Optional[str] = None, index_path: str = "./rag_index", batch_size: int = 32 ) -> Dict: """ 建立/更新 RAG 向量索引。 Args: source: 資料來源 - mimic-cxr: MIMIC-CXR 資料集 (需 PhysioNet 授權) - chexpert: CheXpert 資料集 - custom: 自定義資料 (需提供 data_path) data_path: 自定義資料路徑 (CSV with image_path, report columns) index_path: 索引儲存路徑 batch_size: 編碼批次大小 Returns: status: 建立狀態 index_size: 索引大小 build_time_s: 建立耗時 """ ``` #### 模型清單 (Visual RAG) | 模型 | HuggingFace ID | 大小 | 用途 | |:-----|:---------------|:-----|:-----| | RAD-DINO | `microsoft/rad-dino` | ~350MB | 影像編碼 | | DenseNet-121 | torchxrayvision 內建 | ~30MB | 快速分類 | | PSPNet | torchxrayvision 內建 | ~50MB | 器官分割 | | FAISS | faiss-cpu/faiss-gpu | - | 向量檢索 | #### 優勢 | 面向 | 傳統模型 | Visual RAG 混合 | |:-----|:---------|:----------------| | **類別數** | 固定 18 類 | **無限** (依參考庫) | | **可解釋** | ❌ 黑盒 | ✅ 有參考來源 | | **報告品質** | 模型決定 | **真實報告參考** | | **擴展性** | 換模型 | 加參考資料 | | **下載量** | ~15GB | **~400MB** | --- ## 5. Canvas 繪畫工作區規格 ### 5.1 設計理念 Canvas 繪畫工作區是用戶與 Agent 互動的核心介面。所有互動都透過 MCP Protocol 進行，實現： - **用戶 → Agent**：用戶在 Canvas 繪製區域、點擊、標註 → 透過 MCP 傳給 Agent - **Agent → Canvas**：Agent 分析結果、分割遮罩、標註 → 透過 MCP 推送到 Canvas ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ Canvas 繪畫工作區 │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌───────────────────────────────────────────────────────────────────────┐ │ │ │ Image + Annotation Layers │ │ │ │ │ │ │ │ ┌─────────────────────────────────┐ │ │ │ │ │ Layer 0: Original Image │ │ │ │ │ │ Layer 1: AI Segmentation Masks │ │ │ │ │ │ Layer 2: AI Bounding Boxes │ │ │ │ │ │ Layer 3: User Drawings │ │ │ │ │ │ Layer 4: Measurements │ │ │ │ │ │ Layer 5: Annotations/Labels │ │ │ │ │ └─────────────────────────────────┘ │ │ │ │ │ │ │ │ [Pan] [Zoom+] [Zoom-] [Fit] │ [Select] [BBox] [Polygon] [Point] │ │ │ │ [Freehand] [Measure] [Text] │ [Undo] [Redo] [Clear] [Layers] │ │ │ │ │ │ │ └───────────────────────────────────────────────────────────────────────┘ │ │ │ │ ┌─────────────────────────────────────┐ ┌─────────────────────────────────┐│ │ │ Agent 對話區 │ │ 分析結果面板 ││ │ │ ───────────────────────────── │ │ ───────────────────────── ││ │ │ 🤖 我分析了這張 CXR，發現... │ │ Classification: ││ │ │ [顯示區域] [詳細說明] │ │ ├─ Cardiomegaly: 0.85 ││ │ │ │ │ └─ Effusion: 0.72 ││ │ │ 👤 這個區域看起來不太對... │ │ ││ │ │ [用戶標記的區域] │ │ Detections: 3 findings ││ │ │ │ │ Segments: 5 organs ││ │ │ 🤖 讓我仔細看看這個區域... │ │ ││ │ │ │ │ [Full Report] [Export] ││ │ │ [輸入訊息... ] [送出] │ │ ││ │ └─────────────────────────────────────┘ └─────────────────────────────────┘│ │ │ │ ┌───────────────────────────────────────────────────────────────────────┐ │ │ │ Image List: [CXR_001 ✓] [KUB_001] [CT_slice_42] │ [+ Upload] │ │ │ └───────────────────────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────────┘ ``` ### 5.2 UI 技術堆疊 ``` React + TypeScript ├── Fabric.js # Canvas 繪圖/標註 ├── MCP Client # 與 MedVision MCP Server 通訊 ├── React Query # 資料同步 ├── Zustand # 狀態管理 ├── Tailwind CSS # 樣式 └── SSE / WebSocket # 即時更新 (MCP Resources 訂閱) ``` ### 5.3 MCP 通訊流程 ``` ┌────────────────┐ MCP Protocol ┌────────────────┐ │ Canvas UI │ ◄───────────────────────────────▶│ MedVision MCP Server │ └───────┬────────┘ └───────┬────────┘ │ │ │ 1. User draws region on canvas │ │ ─────────────────────────────────────────────────▶│ │ call_tool("analyze_selected_region", {...}) │ │ │ │ 2. Agent processes with internal tools │ │ │ │ │ ▼ │ │ 3. Server pushes result via MCP Resources │ │ ◀───────────────────────────────────────────────── │ resource_update("session/{id}/annotations") │ │ │ │ 4. Canvas receives and renders mask │ │ │ ▼ ▼ ``` ### 5.4 UI 事件 | 事件 | 觸發條件 | MCP Tool 調用 | |:-----|:---------|:-------------| | `image_uploaded` | 用戶上傳影像 | `add_image_to_session` | | `region_selected` | 用戶在 Canvas 繪製區域 | `analyze_selected_region` | | `annotation_added` | 用戶手動加入標註 | `add_annotation` | | `question_asked` | 用戶輸入問題 | `invoke_medical_agent` 或 `ask_about_image` | | `export_requested` | 用戶點擊匯出 | `export_study` | ### 5.5 MCP Resources 訂閱 (Server → UI) | Resource | 更新時機 | UI 動作 | |:---------|:---------|:--------| | `session/{id}/image` | 當前影像變更 | 重新載入影像 | | `session/{id}/annotations` | 標註新增/更新 | 更新標註列表 + Canvas | | `session/{id}/segmentation` | 分割完成 | 疊加分割遮罩 | | `session/{id}/analysis` | 分析進度更新 | 更新分析面板 | | `session/{id}/agent_message` | Agent 回覆 | 顯示在對話區 | --- ## 6. 業務流程 ### 6.1 標準流程 ``` ┌─────────────────────────────────────────────────────────────────┐ │ Phase 1: 初始分析 │ └─────────────────────────────────────────────────────────────────┘ User: 上傳 CXR ↓ Agent: create_study_session() add_image_to_session(auto_analyze=True, preset="full") ↓ System: 執行 classify → detect → segment → report 每步驟推送到 UI ↓ Agent: "分析完成！發現以下異常..." UI: 顯示標註後的影像 + 報告 ┌─────────────────────────────────────────────────────────────────┐ │ Phase 2: 互動討論 │ └─────────────────────────────────────────────────────────────────┘ User: 在 Canvas 上框選區域 "這個區域是什麼？" ↓ UI: 發送 region_selected 事件 ↓ Agent: analyze_selected_region( region={"type": "bbox", ...}, question="這個區域是什麼？", actions=["describe", "segment"] ) ↓ System: VQA 描述 + SAM 分割推送到 UI ↓ Agent: "這個區域顯示..." UI: 疊加分割遮罩 ┌─────────────────────────────────────────────────────────────────┐ │ Phase 3: 加入標註 │ └─────────────────────────────────────────────────────────────────┘ User: 繪製 Polygon 並標記為 "疑似病灶" ↓ UI: 發送 annotation_added 事件 ↓ Agent: add_annotation( annotation_type="polygon", coordinates=[...], label="疑似病灶" ) ↓ UI: 更新標註列表 ┌─────────────────────────────────────────────────────────────────┐ │ Phase 4: 切換影像 │ └─────────────────────────────────────────────────────────────────┘ User: 上傳 KUB "這張也幫我分析" ↓ Agent: add_image_to_session( image_path="kub.png", auto_analyze=True ) ↓ System: 分析 KUB UI: 加入影像列表，切換顯示 ┌─────────────────────────────────────────────────────────────────┐ │ Phase 5: 匯出 │ └─────────────────────────────────────────────────────────────────┘ User: "匯出完整報告" ↓ Agent: export_study( format="pdf", include_report=True, include_annotations=True ) ↓ System: 生成 PDF Agent: "報告已匯出: [下載連結]" ``` ### 6.2 Headless 模式流程 ``` # 不需要 UI，純 CLI/Agent 使用 Agent: create_study_session() Agent: add_image_to_session("cxr.png", auto_analyze=True) → 取得分析結果 JSON Agent: analyze_selected_region(region={...}) → 取得區域分析結果 Agent: export_study(format="json") → 取得匯出檔案路徑 ``` --- ## 7. AI 模型整合 ### 7.1 模型清單總覽 | 類別 | 模型數量 | 主要模態 | |:-----|:---------|:---------| | Radiology VLM | 6 | CXR, Multi | | Medical Image Encoders | 4 | CXR, Multi | | CT/MRI Segmentation | 1 | CT, MRI | | Pathology Foundation | 3 | WSI, H&E | | Interactive Segmentation | 2 | All | | Legacy (v1) | 4 | CXR | > **📊 資料抓取日期**：下列模型下載量統計來自 HuggingFace，抓取日期為 **2026-02**。實際數據可能有變動。 ### 7.2 Radiology VLM (視覺語言模型) | 模型 | 來源 | 參數量 | 功能 | 下載量 | HuggingFace | |:-----|:-----|:-------|:-----|:-------|:------------| | **CheXagent-2-3b** | StanfordAIMI | 3B | VQA, Report Gen, Structured Findings | 1.99k/月 | `StanfordAIMI/CheXagent-2-3b` | | **CheXOne** | StanfordAIMI | 4B | Image-Text-to-Text (最新) | - | `StanfordAIMI/CheXOne` | | **CheXagent-8b** | StanfordAIMI | 8B | VQA, Report Gen | 551/月 | `StanfordAIMI/CheXagent-8b` | | **MAIRA-2** | Microsoft | 7B | Grounded Report Gen, Phrase Grounding | 3.5k/月 | `microsoft/maira-2` | | **LLaVA-Med-v1.5** | Microsoft | 8B | 通用醫療 VQA | 10.2k/月 | `microsoft/llava-med-v1.5-mistral-7b` | | **RadFM** | chaoyi-wu | - | Multi-modal Radiology FM | - | `chaoyi-wu/RadFM` | ``` 功能對照： ┌──────────────────────┬─────────────┬──────────────┬──────────────┐ │ 模型 │ VQA │ Report Gen │ Grounding │ ├──────────────────────┼─────────────┼──────────────┼──────────────┤ │ CheXagent-2-3b │ ✓ │ ✓ │ ✗ │ │ MAIRA-2 │ ✓ │ ✓ (grounded) │ ✓ (bbox) │ │ LLaVA-Med │ ✓ │ ✓ │ ✗ │ └──────────────────────┴─────────────┴──────────────┴──────────────┘ ``` ### 7.3 Medical Image Encoders (影像編碼器) | 模型 | 來源 | 用途 | 訓練資料 | 下載量 | HuggingFace | |:-----|:-----|:-----|:---------|:-------|:------------| | **BiomedCLIP** | Microsoft | Zero-shot Classification | PMC-15M (15M image-caption pairs) | 713k/月 | `microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224` | | **RAD-DINO** | Microsoft | CXR Feature Extraction | 882k CXR images | 60k/月 | `microsoft/rad-dino` | | **MedCLIP** | Community | Medical Image Classification | ROCO Dataset | 43/月 | `kaushalya/medclip` | | **RadFM Encoder** | chaoyi-wu | Multi-modal Encoding | Multi-modality | - | `chaoyi-wu/RadFM` | ``` BiomedCLIP 支援類別範例： ├── adenocarcinoma histopathology ├── brain MRI ├── chest X-ray ├── bone X-ray ├── squamous cell carcinoma histopathology ├── immunohistochemistry histopathology └── hematoxylin and eosin histopathology ``` ### 7.4 CT/MRI Segmentation (全身分割) | 模型 | 來源 | 類別數 | 支援模態 | GitHub Stars | 連結 | |:-----|:-----|:-------|:---------|:-------------|:-----| | **TotalSegmentator** | wasserth | 117+ CT, 50+ MRI | CT, MRI | 2.4k | [GitHub](https://github.com/wasserth/TotalSegmentator) | ``` TotalSegmentator 主要任務： ├── total: 117 主類別 (CT) ├── total_mr: 50 主類別 (MRI) ├── lung_vessels: 肺血管、氣管 ├── body: 身體、軀幹、四肢、皮膚 ├── vertebrae_mr: 脊椎 (MRI) ├── cerebral_bleed: 腦出血 ├── pleural_pericard_effusion: 胸膜/心包積液 ├── head_glands_cavities: 頭部腺體/腔室 ├── liver_vessels: 肝血管、肝腫瘤 ├── lung_nodules: 肺結節 ├── brain_structures: 腦結構 └── coronary_arteries: 冠狀動脈 ``` ### 7.5 Pathology Foundation Models (病理基礎模型) | 模型 | 來源 | 參數量 | 訓練資料 | 下載量 | License | HuggingFace | |:-----|:-----|:-------|:---------|:-------|:--------|:------------| | **Prov-GigaPath** | Microsoft | - | Real-world WSI | 328k/月 | Apache-2.0 | `prov-gigapath/prov-gigapath` | | **UNI2-h** | MahmoodLab | 681M | 200M+ tiles, 350k slides | 44k/月 | CC-BY-NC-ND-4.0 | `MahmoodLab/UNI2-h` | | **CONCH** | MahmoodLab | 200M (ViT-B/16 + Text) | 1.17M image-caption pairs | 17k/月 | CC-BY-NC-ND-4.0 | `MahmoodLab/CONCH` | ``` 病理模型功能對照： ┌──────────────────────┬─────────────┬──────────────┬──────────────┬───────────┐ │ 模型 │ Tile Enc │ Slide Enc │ Text Enc │ Zero-shot │ ├──────────────────────┼─────────────┼──────────────┼──────────────┼───────────┤ │ Prov-GigaPath │ ✓ │ ✓ │ ✗ │ ✗ │ │ UNI2-h │ ✓ │ (via MIL) │ ✗ │ ✗ │ │ CONCH │ ✓ │ (via MIL) │ ✓ │ ✓ │ └──────────────────────┴─────────────┴──────────────┴──────────────┴───────────┘ ``` ### 7.6 Interactive Segmentation (互動分割) | 模型 | 來源 | 訓練資料 | 特色 | 連結 | |:-----|:-----|:---------|:-----|:-----| | **Medical-SAM3** | ChongCong | 33 datasets, 10 modalities | 醫療專用微調 | [GitHub](https://github.com/AIM-Research-Lab/Medical-SAM3) | | **SAM 3** | Meta | 11M images, 1.1B masks | 通用版本, Text prompt | [GitHub](https://github.com/facebookresearch/sam3) | ### 7.7 Legacy Models (v1 繼承) | 模型 | 用途 | 支援影像類型 | 推理後端 | |:-----|:-----|:-------------|:---------| | **DenseNet (torchxrayvision)** | 分類 18 種病變 | CXR | PyTorch | | **PSPNet (torchxrayvision)** | 器官分割 | CXR | PyTorch | | **ViT-BERT** | 報告生成 | CXR | PyTorch | | **Roentgen** | 影像生成 | CXR | PyTorch | > **注意**: MedSAM 已升級為 SAM 3 / Medical-SAM3，提供更好的分割精度和效能。 ### 7.8 模型路由 ```python MODEL_ROUTING = { "CXR": { "classify": ["densenet", "biomedclip"], "encode": "rad-dino", "segment": "pspnet", "vqa": "chexagent-2", "ground": "maira2", "report": ["chexagent-2", "vit-bert"], "interactive_segment": "medical-sam3", }, "KUB": { "classify": "biomedclip", "vqa": "llava-med", "interactive_segment": "medical-sam3", }, "CT": { "segment_anatomy": "totalsegmentator", "vqa": "llava-med", "interactive_segment": "medical-sam3", }, "MRI": { "segment_anatomy": "totalsegmentator", # total_mr task "vqa": "llava-med", "interactive_segment": "medical-sam3", }, "Pathology": { "encode_tile": ["prov-gigapath", "uni2-h", "conch"], "encode_slide": "prov-gigapath", "zero_shot": "conch", "interactive_segment": "sam3", }, "EKG": { "vqa": "llava-med", }, "Other": { "classify": "biomedclip", "vqa": "llava-med", "interactive_segment": "sam3", } } ``` ### 7.9 模型載入策略 ```python from vllm import LLM, SamplingParams import ollama class ModelRegistry: """統一模型管理，支援 vLLM、Ollama、PyTorch""" def __init__(self, model_dir: str, device: str = "cuda"): self.model_dir = model_dir self.device = device self._pytorch_models: Dict[str, Any] = {} self._vllm_models: Dict[str, LLM] = {} self._ollama_client = ollama.Client() def get_pytorch(self, model_name: str) -> Any: """PyTorch 模型（分類、分割、SAM）""" if model_name not in self._pytorch_models: self._pytorch_models[model_name] = self._load_pytorch(model_name) return self._pytorch_models[model_name] def get_vllm(self, model_name: str) -> LLM: """vLLM 模型（高效 VQA/LLM）""" if model_name not in self._vllm_models: self._vllm_models[model_name] = LLM( model=self._get_model_path(model_name), tensor_parallel_size=1, gpu_memory_utilization=0.8, quantization="awq", # 或 "gptq" ) return self._vllm_models[model_name] def call_ollama(self, model_name: str, prompt: str, images: List[str] = None) -> str: """Ollama 模型（泛用/備援）""" response = self._ollama_client.chat( model=model_name, messages=[{ "role": "user", "content": prompt, "images": images or [] }] ) return response["message"]["content"] def unload(self, model_name: str): if model_name in self._pytorch_models: del self._pytorch_models[model_name] if model_name in self._vllm_models: del self._vllm_models[model_name] torch.cuda.empty_cache() ``` ### 7.10 vLLM 配置 ```yaml # vllm_config.yaml models: chexagent-2: model_path: "StanfordAIMI/CheXagent-2-3b" tensor_parallel_size: 1 gpu_memory_utilization: 0.4 max_model_len: 4096 quantization: "awq" llava-med: model_path: "microsoft/llava-med-v1.5-mistral-7b" tensor_parallel_size: 1 gpu_memory_utilization: 0.4 max_model_len: 4096 quantization: "gptq" maira2: model_path: "microsoft/maira-2" tensor_parallel_size: 1 gpu_memory_utilization: 0.3 trust_remote_code: true chexone: model_path: "StanfordAIMI/CheXOne" tensor_parallel_size: 1 gpu_memory_utilization: 0.5 max_model_len: 4096 ``` ### 7.11 Ollama 配置 ```yaml # ollama_config.yaml models: # 預載模型 preload: - llava:13b - llama3:8b # 模型對應 aliases: vision: llava:13b text: llama3:8b medical: llava-med:latest # 自訂模型 ``` --- ## 8. 技術規格 ### 8.1 依賴 ```toml [project] requires-python = ">=3.11" [project.dependencies] # MCP mcp = ">=1.0.0" fastmcp = ">=0.1.0" # AI Inference torch = ">=2.2.0" transformers = ">=4.40.0" vllm = ">=0.4.0" ollama = ">=0.2.0" # Vision Models torchxrayvision = ">=1.0.0" segment-anything-3 = ">=1.0.0" # SAM 3 / Medical-SAM3 timm = ">=0.9.8" # UNI2-h, Prov-GigaPath open_clip_torch = ">=2.23.0" # BiomedCLIP, CONCH # TotalSegmentator (CT/MRI) TotalSegmentator = ">=2.0.0" # Image Processing pillow = ">=10.0.0" opencv-python = ">=4.8.0" scikit-image = ">=0.21.0" nibabel = ">=5.0.0" # NIfTI for CT/MRI # Database sqlalchemy = ">=2.0.0" aiosqlite = ">=0.19.0" # Web/UI fastapi = ">=0.110.0" websockets = ">=12.0" uvicorn = ">=0.27.0" # Utilities pydantic = ">=2.0.0" python-dotenv = ">=1.0.0" huggingface-hub = ">=0.20.0" # Model downloads [project.optional-dependencies] ui = [ "gradio>=5.0.0", # 備用 UI ] vscode = [ # VS Code Extension 在 TypeScript 專案中 ] pathology = [ "openslide-python>=1.3.0", # WSI 讀取 "cucim>=24.0.0", # GPU-accelerated WSI ] ``` ### 8.2 目錄結構 ``` medvision-mcp/ ├── mcp_server/ │ ├── __init__.py │ ├── server.py # MCP Server 入口 │ ├── tools/ │ │ ├── __init__.py │ │ ├── session.py # Session 管理工具 │ │ ├── analysis.py # 分析工具 │ │ ├── interactive.py # 互動工具 │ │ ├── query.py # 查詢工具 │ │ └── export.py # 匯出工具 │ └── schemas/ │ ├── __init__.py │ └── models.py # Pydantic 模型 │ ├── core/ │ ├── __init__.py │ ├── session.py # Session 管理 │ ├── database.py # SQLite 連接 │ ├── models.py # SQLAlchemy 模型 │ ├── registry.py # 模型註冊 (vLLM/Ollama/PyTorch) │ └── router.py # 模型路由 │ ├── inference/ # 推理後端 │ ├── __init__.py │ ├── vllm_backend.py # vLLM 推理 │ ├── ollama_backend.py # Ollama 推理 │ └── pytorch_backend.py # PyTorch 直接推理 │ ├── models/ # AI 模型封裝 │ ├── __init__.py │ ├── classification.py # DenseNet, BiomedCLIP │ ├── segmentation.py # PSPNet │ ├── grounding.py # MAIRA-2 │ ├── vqa.py # CheXagent, LLaVA-Med │ ├── report.py # Report Generation │ ├── sam3.py # SAM 3 / Medical-SAM3 互動分割 │ ├── encoders.py # RAD-DINO, BiomedCLIP │ ├── totalsegmentator.py # CT/MRI 解剖分割 │ └── pathology/ # 病理模型 │ ├── __init__.py │ ├── gigapath.py # Prov-GigaPath │ ├── uni.py # UNI2-h │ └── conch.py # CONCH │ ├── api/ # REST API (給 UI 用) │ ├── __init__.py │ ├── server.py # FastAPI 入口 │ ├── routes/ │ │ ├── sessions.py │ │ ├── images.py │ │ └── websocket.py # WebSocket 事件 │ └── deps.py # 依賴注入 │ ├── ui/ # 備用 Gradio UI │ ├── __init__.py │ └── app.py │ ├── utils/ │ ├── __init__.py │ ├── image.py # 影像處理 │ ├── dicom.py # DICOM 處理 │ └── export.py # 匯出功能 │ ├── db/ │ └── medvision-mcp.db # SQLite 資料庫 │ └── legacy/ # 保留舊版相容 ├── agent.py └── tools.py # VS Code Extension (獨立專案) medvision-mcp-vscode/ ├── package.json ├── src/ │ ├── extension.ts # Extension 入口 │ ├── mcpClient.ts # MCP 客戶端 │ └── webview/ │ ├── App.tsx # React App │ ├── components/ │ │ ├── ImageViewer.tsx │ │ ├── Canvas.tsx # Fabric.js 畫板 │ │ ├── AnnotationList.tsx │ │ └── AnalysisPanel.tsx │ └── hooks/ │ └── useMCP.ts # MCP 調用 hook └── tsconfig.json ``` ### 8.3 配置 ```yaml # config.yaml server: host: "0.0.0.0" mcp_port: 8000 api_port: 8001 ui_port: 7860 database: url: "sqlite:///db/medvision-mcp.db" echo: false # SQL logging inference: # vLLM 配置 vllm: enabled: true gpu_memory_utilization: 0.8 tensor_parallel_size: 1 models: - chexagent - llava-med - maira2 # Ollama 配置 ollama: enabled: true host: "http://localhost:11434" models: - llava:13b - llama3:8b # PyTorch 直接載入 pytorch: device: "cuda" models: - densenet - pspnet - sam3 - roentgen models: dir: "/model-weights" cache_dir: "/model-cache" session: ttl: 86400 # 24 hours cleanup_interval: 3600 # 1 hour ui: enabled: true canvas: max_width: 2048 max_height: 2048 default_tools: ["pan", "zoom", "bbox", "polygon", "point"] export: dir: "exports" templates: - "default" - "clinical" - "research" ``` ### 8.4 SQLite Schema ```sql -- sessions 表 CREATE TABLE sessions ( id TEXT PRIMARY KEY, name TEXT, current_image_id TEXT, metadata JSON, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); -- images 表 CREATE TABLE images ( id TEXT PRIMARY KEY, session_id TEXT NOT NULL, path TEXT NOT NULL, type TEXT NOT NULL, -- CXR, KUB, EKG, CT, MRI, DICOM, Other modality TEXT, metadata JSON, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, FOREIGN KEY (session_id) REFERENCES sessions(id) ON DELETE CASCADE ); -- analyses 表 CREATE TABLE analyses ( id TEXT PRIMARY KEY, image_id TEXT NOT NULL, classification JSON, detections JSON, segmentation JSON, report TEXT, vqa_history JSON, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, FOREIGN KEY (image_id) REFERENCES images(id) ON DELETE CASCADE ); -- annotations 表 CREATE TABLE annotations ( id TEXT PRIMARY KEY, image_id TEXT NOT NULL, type TEXT NOT NULL, -- bbox, polygon, point, mask, measurement, text coordinates JSON NOT NULL, label TEXT, confidence REAL, source TEXT NOT NULL, -- ai, user, system notes TEXT, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, FOREIGN KEY (image_id) REFERENCES images(id) ON DELETE CASCADE ); -- interactions 表 CREATE TABLE interactions ( id TEXT PRIMARY KEY, session_id TEXT NOT NULL, type TEXT NOT NULL, -- upload, region_select, annotation, query, export data JSON, timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP, FOREIGN KEY (session_id) REFERENCES sessions(id) ON DELETE CASCADE ); -- 索引 CREATE INDEX idx_images_session ON images(session_id); CREATE INDEX idx_analyses_image ON analyses(image_id); CREATE INDEX idx_annotations_image ON annotations(image_id); CREATE INDEX idx_interactions_session ON interactions(session_id); ``` --- ## 9. API 範例 ### 9.1 MCP 調用範例 ```python # 使用 mcp client from mcp import Client async with Client("http://localhost:8000") as client: # 創建 Session result = await client.call_tool("create_study_session", {}) session_id = result["session_id"] # 加入影像並分析 result = await client.call_tool("add_image_to_session", { "session_id": session_id, "image_path": "/path/to/cxr.png", "auto_analyze": True, "analysis_config": {"preset": "full"} }) # 分析選定區域 result = await client.call_tool("analyze_selected_region", { "session_id": session_id, "region": {"type": "bbox", "coordinates": [100, 200, 200, 300]}, "question": "這是什麼？", "actions": ["describe", "segment"] }) # 匯出 result = await client.call_tool("export_study", { "session_id": session_id, "format": "pdf" }) ``` ### 9.2 Claude Desktop 使用 ```json // claude_desktop_config.json { "mcpServers": { "medvision-mcp": { "command": "python", "args": ["-m", "medvision-mcp.mcp_server"], "env": { "MODEL_DIR": "/model-weights" } } } } ``` --- ## 10. 開發階段 (ROADMAP) > **🔄 策略調整 (2026-02)**：原計畫依賴 CheXagent VLM，因下載速度問題 (~315 小時)，改採 **Visual RAG Mode B** 策略：輕量模型 + 檢索。 ### 10.1 當前狀態 (2026-02-02) | 模組 | 狀態 | 說明 | |:-----|:-----|:-----| | **RAD-DINO** | ✅ Ready | 346MB, 768-dim embedding, ~2s/image | | **FAISS** | ✅ Ready | L2 向量檢索, <1ms | | **DenseNet-121** | ✅ Ready | 18 類 CXR 分類 | | **PSPNet** | ✅ Ready | 14 器官分割 | | **Ollama + LLaVA** | ✅ Ready | 文字模式可用, 視覺模式待修復 | | **DICOM Processor** | ✅ Ready | pydicom + Window/Level | | **Report Generator** | ✅ Ready | ViT-BERT 權重已下載 | | **MCP Server** | ❌ 未實作 | FastMCP 框架待建立 | | **Canvas UI** | ❌ 未實作 | React + Fabric.js | ### 10.2 新版 MVP 定義 (Visual RAG Mode B) > **MVP 目標**：可運行的 MCP Server + Visual RAG 核心流程 ``` MVP 核心功能 (Phase 1)： ✅ MCP Server 啟動 (FastMCP + stdio) ✅ Visual RAG Pipeline: ├─ RAD-DINO 影像編碼 ├─ FAISS 相似案例檢索 └─ DenseNet 快速分類 ✅ 核心 Tools: ├─ analyze_image (分類) ├─ search_similar_cases (RAG) └─ analyze_with_rag (混合) ✅ SQLite Session 管理 MVP 延後功能： ❌ Canvas UI (Phase 2) ❌ 互動分割 SAM3 (Phase 3) ❌ VS Code Extension (Phase 4) ❌ 內建 Agent A2A (Phase 3) ``` ### 10.3 Phase 定義 #### Phase 1: Visual RAG Core (Week 1-2) ``` 優先任務： ├── [x] RAD-DINO + FAISS 驗證 (已完成) ├── [x] DenseNet + PSPNet 驗證 (已完成) ├── [ ] FastMCP Server 框架 │ ├── 專案結構 (src/medvision_mcp/) │ ├── MCP stdio transport │ └── Tool 註冊機制 ├── [ ] 核心 Tools 實作 │ ├── create_study_session │ ├── add_image_to_session │ ├── analyze_image (DenseNet) │ ├── search_similar_cases (RAD-DINO + FAISS) │ └── analyze_with_rag (混合) ├── [ ] Reference Database │ ├── SQLite schema │ ├── EURORAD 案例匯入 │ └── Embedding 預計算 └── [ ] 基本測試 + Claude Desktop 整合 ``` #### Phase 2: Canvas UI (Week 3-4) ``` 任務： ├── [ ] React + Vite 專案 ├── [ ] Fabric.js Canvas 組件 ├── [ ] MCP Client (stdio) ├── [ ] 基本繪圖工具 (bbox, polygon) ├── [ ] analyze_selected_region Tool └── [ ] 區域 → RAG 查詢完整流程 ``` #### Phase 3: Agent + SAM3 (Week 5-6) ``` 任務： ├── [ ] Medical-SAM3 整合 ├── [ ] invoke_medical_agent Tool ├── [ ] LangGraph Agent 架構 ├── [ ] push_to_canvas 雙向互動 └── [ ] 多輪對話支援 ``` #### Phase 4: VS Code Extension (Week 7-8) ``` 任務： ├── [ ] Extension 專案結構 ├── [ ] WebView 整合 Canvas ├── [ ] 命令註冊 └── [ ] 發布準備 ``` #### Phase 5: Polish (Week 9-10) ``` 任務： ├── [ ] Export 功能 (PDF, JSON) ├── [ ] 效能優化 ├── [ ] 文檔完善 └── [ ] 公開發布 ``` ### 10.4 技術依賴圖 ``` ┌─────────────────────────────────────────────────────┐ │ Phase 1 (Core) │ │ │ │ ┌─────────┐ ┌─────────┐ ┌─────────────────┐ │ │ │RAD-DINO │ │ FAISS │ │ DenseNet/PSPNet │ │ │ │ (✅) │ │ (✅) │ │ (✅) │ │ │ └────┬────┘ └────┬────┘ └────────┬────────┘ │ │ │ │ │ │ │ └──────┬──────┴──────────┬───────┘ │ │ │ │ │ │ ┌─────▼─────┐ ┌─────▼─────┐ │ │ │ MCP Tools │ │ SQLite │ │ │ │ (待實作) │ │ (待實作) │ │ │ └─────┬─────┘ └─────┬─────┘ │ │ │ │ │ │ ┌─────▼─────────────────▼─────┐ │ │ │ FastMCP Server │ │ │ │ (待實作) │ │ │ └─────────────┬───────────────┘ │ └───────────────────────┼─────────────────────────────┘ │ stdio ┌───────────────▼───────────────┐ │ Claude Desktop │ └───────────────────────────────┘ ``` --- ## 11. 已決定事項 | 項目 | 決定 | 備註 | |:-----|:-----|:-----| | **整體架構** | MCP Server + Multi-Model Tools + 內建 Agent | A2A-like 設計 | | **核心策略** | **Visual RAG Mode B** | RAD-DINO + FAISS + DenseNet (2026-02 決議) | | **Agent 設計** | 內建 MedVision MCP Medical Agent | 使用 MCP Tools，支援外部 Agent 委託 | | **Canvas UI** | React + Fabric.js 繪畫工作區 | 透過 MCP Protocol 與 Agent 互動 | | Session 持久化 | **SQLite** | 輕量、無依賴、可持久化 | | GPU 推理 | **Ollama (主) + PyTorch 直接載入** | vLLM 備選 (需大模型時) | | 互動分割 | **Medical-SAM3** | 醫療專用微調版 | | UI 框架 | **React + Fabric.js** | VS Code WebView 相容 | | MCP Transport | **stdio 優先** | 後續可加 HTTP/SSE | | MVP 模型 | **DenseNet + RAG + Ollama** | 避免 CheXagent 下載瓶頸 | | 影像編碼器 | **RAD-DINO** | 768-dim, 346MB, 已驗證 | | 向量檢索 | **FAISS (CPU)** | L2 距離, <1ms | ## 12. 待討論事項 ### Post-MVP 1. **內建 Agent 實作**： - 使用 LangGraph ReAct 還是簡單的 Chain？ - Agent 底層模型：使用 Ollama LLaVA 還是 Claude/GPT API？ 2. **Canvas 互動細節**： - 多選區域同時分析？ - 是否支援連續繪圖模式？ 3. **A2A 協議**： - 需要標準化的 Agent 委託格式嗎？ - 是否需要 Agent 間的狀態共享？ 4. **多用戶支援**：是否需要用戶認證？Token-based or Session-based? 5. **新影像類型模型**： - KUB: 使用 LLaVA-Med，還是需要專門訓練？ - EKG: 需要專門的 ECG 分析模型嗎？(如 ECG-FM) 6. **DICOM SR 標準**：要支援哪些 template？TID 1500 (Measurement Report)? 7. **VS Code Extension 發布**： - 公開 Marketplace 還是私有？ - 需要簽署嗎？ 8. **影像儲存**： - 影像存在本地 vs 雲端？ - 需要壓縮/縮圖嗎？ 9. **Ollama 視覺模式**： - llava:7b runner 錯誤待修復 - 是否改用 vLLM 或直接載入模型？ --- ## 附錄 A: 現有工具對照 | v1 工具 | v2 MCP 工具 | 推理後端 | 新增模型選項 | |:--------|:-----------|:---------|:-------------| | ChestXRayClassifierTool | analyze_image(classify=True) | PyTorch | BiomedCLIP, RAD-DINO | | ChestXRaySegmentationTool | analyze_image(segment=True) | PyTorch | TotalSegmentator (CT/MRI) | | XRayVQATool | ask_about_image | vLLM | CheXagent-2, CheXOne | | XRayPhraseGroundingTool | analyze_image(detect=True) | vLLM | MAIRA-2 | | ChestXRayReportGeneratorTool | analyze_image(generate_report=True) | PyTorch/vLLM | CheXagent-2 | | LlavaMedTool | ask_about_image | vLLM/Ollama | - | | ChestXRayGeneratorTool | generate_image (新增) | PyTorch | - | | DicomProcessorTool | add_image_to_session | - | - | | ImageVisualizerTool | get_visualization | - | - | | (新增) | analyze_selected_region | SAM 3 + vLLM | Medical-SAM3 | | (新增) | analyze_pathology | PyTorch | Prov-GigaPath, UNI2-h, CONCH | | (新增) | segment_anatomy | PyTorch | TotalSegmentator | | (新增 - Agent) | invoke_medical_agent | Agent + Tools | A2A 介面 | | (新增 - Canvas) | sync_canvas_state | - | UI 同步 | | (新增 - Canvas) | push_to_canvas | - | 推送視覺化 | | (新增 - Canvas) | request_user_input | - | 請求用戶互動 | --- ## 附錄 B: VS Code Extension 規格 ### B.1 Extension 架構 ``` medvision-mcp-vscode/ ├── package.json ├── src/ │ ├── extension.ts # Extension 入口 │ ├── mcpClient.ts # MCP 客戶端 │ ├── commands/ │ │ ├── openViewer.ts # 開啟影像檢視器 │ │ ├── analyzeImage.ts # 分析影像命令 │ │ └── exportReport.ts # 匯出報告 │ └── webview/ │ ├── App.tsx # React App │ ├── components/ │ │ ├── ImageViewer.tsx │ │ ├── Canvas.tsx # Fabric.js 畫板 │ │ ├── Toolbar.tsx # 工具列 │ │ ├── AnnotationList.tsx │ │ └── AnalysisPanel.tsx │ ├── hooks/ │ │ ├── useMCP.ts # MCP 調用 hook │ │ ├── useCanvas.ts # Canvas 操作 hook │ │ └── useSession.ts # Session 狀態 hook │ └── utils/ │ └── websocket.ts # WebSocket 連接 ├── webview-ui/ # 打包後的 WebView UI ├── tsconfig.json └── webpack.config.js ``` ### B.2 package.json 配置 ```json { "name": "medvision-mcp", "displayName": "MedVision MCP - Medical Image Analysis", "description": "AI-powered medical image analysis with interactive annotation", "version": "0.1.0", "publisher": "medvision-mcp", "engines": { "vscode": "^1.85.0" }, "categories": ["Machine Learning", "Visualization"], "activationEvents": [ "onCommand:medvision-mcp.openViewer", "onWebviewPanel:medvision-mcp.imageViewer" ], "main": "./out/extension.js", "contributes": { "commands": [ { "command": "medvision-mcp.openViewer", "title": "Open Medical Image Viewer", "category": "MedVision MCP" }, { "command": "medvision-mcp.analyzeImage", "title": "Analyze Medical Image", "category": "MedVision MCP" } ], "menus": { "explorer/context": [ { "when": "resourceExtname =~ /\\.(png|jpg|jpeg|dcm|dicom)$/i", "command": "medvision-mcp.openViewer", "group": "medvision-mcp" } ] }, "configuration": { "title": "MedVision MCP", "properties": { "medvision-mcp.serverUrl": { "type": "string", "default": "http://localhost:8000", "description": "MedVision MCP MCP Server URL" }, "medvision-mcp.autoAnalyze": { "type": "boolean", "default": true, "description": "Automatically analyze images on open" } } } } } ``` ### B.3 WebView 通訊協議 ```typescript // extension.ts → webview interface ExtensionToWebviewMessage { type: 'init' | 'update' | 'analysis_result' | 'error'; payload: any; } // webview → extension.ts interface WebviewToExtensionMessage { type: 'ready' | 'analyze' | 'region_select' | 'annotate' | 'export'; payload: any; } // 範例：區域選擇 const message: WebviewToExtensionMessage = { type: 'region_select', payload: { session_id: 'abc123', region: { type: 'bbox', coordinates: [100, 200, 300, 400] }, question: '這是什麼？', actions: ['describe', 'segment'] } }; ``` --- ## 附錄 C: SAM 3 整合 ### C.1 SAM 3 功能 | 功能 | 說明 | 用途 | |:-----|:-----|:-----| | Point Prompt | 點擊分割 | 快速選取病灶 | | Box Prompt | 框選分割 | 精確框選區域 | | Polygon Prompt | 多邊形分割 | 不規則區域 | | Mask Refinement | 遮罩精修 | 邊緣修正 | | Multi-Mask Output | 多遮罩輸出 | 不確定區域提供選項 | ### C.2 API 設計 ```python class SAM3Segmentor: def __init__(self, model_path: str, device: str = "cuda"): self.model = load_sam3(model_path) self.device = device def segment_point( self, image: np.ndarray, points: List[Tuple[int, int]], labels: List[int] = None, # 1=foreground, 0=background multimask: bool = True ) -> Dict: """點擊分割""" return { "masks": [...], # List of masks (if multimask) "scores": [...], # Confidence scores "best_mask_idx": 0 } def segment_box( self, image: np.ndarray, box: Tuple[int, int, int, int] # x1, y1, x2, y2 ) -> Dict: """框選分割""" ... def segment_polygon( self, image: np.ndarray, points: List[Tuple[int, int]] ) -> Dict: """多邊形分割""" ... def refine_mask( self, image: np.ndarray, mask: np.ndarray, points: List[Tuple[int, int]], labels: List[int] ) -> Dict: """遮罩精修""" ... ``` --- ## 附錄 D: 錯誤碼 | 代碼 | HTTP | 說明 | |:-----|:-----|:-----| | `SESSION_NOT_FOUND` | 404 | Session 不存在 | | `SESSION_EXPIRED` | 410 | Session 已過期 | | `IMAGE_NOT_FOUND` | 404 | 影像不存在 | | `INVALID_REGION` | 400 | 區域座標無效 | | `INVALID_IMAGE_TYPE` | 400 | 不支援的影像類型 | | `MODEL_LOAD_ERROR` | 500 | 模型載入失敗 | | `INFERENCE_ERROR` | 500 | 推理執行失敗 | | `VLLM_UNAVAILABLE` | 503 | vLLM 服務不可用 | | `OLLAMA_UNAVAILABLE` | 503 | Ollama 服務不可用 | | `GPU_OOM` | 507 | GPU 記憶體不足 | | `EXPORT_FAILED` | 500 | 匯出失敗 | | `DATABASE_ERROR` | 500 | 資料庫錯誤 | --- ## 附錄 E: 效能基準 ### E.1 目標延遲 | 操作 | 目標延遲 | 備註 | |:-----|:---------|:-----| | 影像上傳 | < 500ms | 不含分析 | | 分類 (DenseNet) | < 200ms | GPU | | 偵測 (MAIRA-2) | < 1s | vLLM | | 分割 (PSPNet) | < 500ms | GPU | | 互動分割 (SAM3) | < 300ms | GPU | | VQA (CheXagent) | < 2s | vLLM | | 報告生成 | < 2s | PyTorch | | 完整分析 (preset=full) | < 5s | 並行 | ### E.2 記憶體需求 | 模型 | VRAM | 備註 | |:-----|:-----|:-----| | DenseNet | ~500MB | | | PSPNet | ~500MB | | | SAM 3 (ViT-H) | ~2GB | | | CheXagent-2-3B (AWQ) | ~2GB | 量化 | | LLaVA-Med-7B (GPTQ) | ~4GB | 量化 | | MAIRA-2 (8bit) | ~3GB | 量化 | | **Total (併行)** | **~12GB** | 建議 RTX 4090 / A100 | --- ## 附錄 F: 安全考量 ### F.1 資料安全 | 項目 | 處理方式 | |:-----|:---------| | 影像儲存 | 本地儲存，不上傳雲端 | | Session 資料 | SQLite 加密 (optional) | | API 通訊 | HTTPS (生產環境) | | 日誌記錄 | 不記錄 PHI | ### F.2 存取控制 ```yaml # 建議的存取控制配置 security: # API Key 認證 (簡單場景) api_key: enabled: false keys: [] # JWT 認證 (多用戶場景) jwt: enabled: false secret: "" expiry: 3600 # CORS cors: allowed_origins: - "http://localhost:*" - "vscode-webview://*" ``` ### F.3 醫療合規注意事項 > ⚠️ **重要提醒** > > MedVision MCP 為研究/教育用途設計，**不是** FDA/CE 認證的醫療器材。 > > - 不應用於臨床診斷決策 > - 所有分析結果僅供參考 > - 部署方須自行確保 HIPAA/GDPR 合規 > - 建議在隔離網路環境中部署 --- ## 附錄 G: 完整模型參考 ### G.1 Radiology VLM 詳細資訊 #### CheXagent-2-3b (StanfordAIMI) ```yaml name: CheXagent-2-3b source: StanfordAIMI huggingface: StanfordAIMI/CheXagent-2-3b params: 3B license: Apache-2.0 capabilities: - Visual Question Answering - Report Generation - Structured Report Findings - Structured Report Impressions variants: - CheXagent-2-3b-srrg-findings # 專用 findings - CheXagent-2-3b-srrg-impression # 專用 impression stats: downloads_monthly: 1,990 likes: 7 updated: 2025-01 ``` #### MAIRA-2 (Microsoft) ```yaml name: MAIRA-2 source: Microsoft Research huggingface: microsoft/maira-2 params: 7B license: MIT capabilities: - Grounded Report Generation (with bounding boxes) - Findings Generation - Phrase Grounding components: image_encoder: RAD-DINO-MAIRA-2 language_model: vicuna-7b-v1.5 input: - Frontal CXR (required) - Lateral CXR (optional) - Prior Study (optional) - Clinical Indication (optional) output: - Findings text - Bounding box annotations (optional) stats: downloads_monthly: 3,566 likes: 68 spaces_using: 19 ``` #### LLaVA-Med-v1.5 (Microsoft) ```yaml name: LLaVA-Med v1.5 Mistral 7B source: Microsoft huggingface: microsoft/llava-med-v1.5-mistral-7b params: 8B license: Apache-2.0 capabilities: - General Medical VQA - Multi-turn Conversation - Multi-modality Support stats: downloads_monthly: 10,200 likes: 117 total_variants: 129 ``` ### G.2 Medical Image Encoders 詳細資訊 #### BiomedCLIP (Microsoft) ```yaml name: BiomedCLIP-PubMedBERT_256-vit_base_patch16_224 source: Microsoft Health Futures huggingface: microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224 license: MIT training_data: PMC-15M (15M figure-caption pairs from PubMed Central) capabilities: - Zero-shot Image Classification - Cross-modal Retrieval - Image-Text Matching architecture: text_encoder: PubMedBERT (context_length=256) image_encoder: ViT-B/16 supported_modalities: - Chest X-ray - Brain MRI - Bone X-ray - Histopathology (adenocarcinoma, SCC, H&E, IHC) - Medical Charts stats: downloads_monthly: 713,607 likes: 382 spaces_using: 47 paper: "A Multimodal Biomedical Foundation Model Trained from Fifteen Million Image–Text Pairs" (NEJM AI 2024) ``` #### RAD-DINO (Microsoft) ```yaml name: RAD-DINO source: Microsoft Health Futures huggingface: microsoft/rad-dino license: MIT params: 86.6M training_data: total_images: 882,775 sources: - MIMIC-CXR: 368,960 - CheXpert: 223,648 - NIH-CXR: 112,120 - PadChest: 136,787 - BRAX: 41,260 architecture: base_model: dinov2-base (facebook/dinov2-base) method: DINOv2 self-supervised learning capabilities: - CXR Feature Extraction - Image Classification (with linear probe) - Image Segmentation (with decoder) - Image Retrieval - Foundation for Report Generation stats: downloads_monthly: 60,345 likes: 69 paper: "Exploring Scalable Medical Image Encoders Beyond Text Supervision" (Nature Machine Intelligence 2025) ``` ### G.3 CT/MRI Segmentation 詳細資訊 #### TotalSegmentator ```yaml name: TotalSegmentator source: University Hospital Basel github: wasserth/TotalSegmentator license: Apache-2.0 version: 2.5.0 classes: ct_total: 117 anatomical structures mr_total: 50 anatomical structures tasks: open_source: - total (117 classes CT) - total_mr (50 classes MR) - lung_vessels - body / body_mr - vertebrae_mr - cerebral_bleed - pleural_pericard_effusion - head_glands_cavities - liver_vessels - lung_nodules - kidney_cysts - liver_segments / liver_segments_mr - trunk_cavities licensed: - heartchambers_highres - appendicular_bones - tissue_types - brain_structures - coronary_arteries installation: pip install TotalSegmentator usage: TotalSegmentator -i ct.nii.gz -o segmentations requirements: - Python >= 3.9 - PyTorch >= 2.0.0 - GPU recommended (CPU supported with --fast) stats: github_stars: 2,400 forks: 395 used_by: 109 projects paper: "TotalSegmentator: Robust Segmentation of 104 Anatomic Structures in CT Images" (Radiology AI 2023) ``` ### G.4 Pathology Foundation Models 詳細資訊 #### Prov-GigaPath (Microsoft) ```yaml name: Prov-GigaPath source: Microsoft / Providence huggingface: prov-gigapath/prov-gigapath license: Apache-2.0 architecture: tile_encoder: DINOv2-based slide_encoder: 12-layer, 768-dim capabilities: - Tile-level Feature Extraction - Slide-level Representation (with coordinates) - Linear Probing (PCam, PANDA) training_data: Real-world WSI from Providence Health input: tile_size: 224x224 image_format: PNG, JPEG output: tile_embedding: 1536-dim slide_embedding: 768-dim stats: downloads_monthly: 328,518 likes: 157 paper: "A whole-slide foundation model for digital pathology from real-world data" (Nature 2024) ``` #### UNI2-h (MahmoodLab) ```yaml name: UNI 2 (UNI2-h) source: Mahmood Lab @ Harvard/BWH huggingface: MahmoodLab/UNI2-h license: CC-BY-NC-ND-4.0 (Academic Only) params: 681M architecture: type: Custom ViT-H img_size: 224 patch_size: 14 embed_dim: 1536 num_heads: 24 depth: 24 mlp: SwiGLU reg_tokens: 8 training_data: images: 200M+ tiles slides: 350k+ H&E and IHC slides source: Mass General Brigham method: DINOv2 SSL (DINO + iBOT + KoLeo) capabilities: - ROI Classification (linear probe, kNN, SimpleShot) - ROI Retrieval - Slide Classification (MIL) - Dense Prediction (with fine-tuning) pre_extracted_features: available: TCGA, CPTAC, PANDA stats: downloads_monthly: 44,393 likes: 88 paper: "Towards a General-Purpose Foundation Model for Computational Pathology" (Nature Medicine 2024) ``` #### CONCH (MahmoodLab) ```yaml name: CONCH source: Mahmood Lab @ Harvard/BWH huggingface: MahmoodLab/CONCH license: CC-BY-NC-ND-4.0 (Academic Only) architecture: vision_encoder: ViT-B/16 (90M params) text_encoder: L12-E768-H12 (110M params) # Decoder removed from public release training_data: pairs: 1.17M image-caption pairs sources: PMC-OA + Internal stains: H&E, IHC, Special stains method: CoCa (Contrastive + Captioning) capabilities: - Zero-shot ROI Classification - Zero-shot WSI Classification (MI-Zero) - Image-Text Retrieval - ROI Classification (linear probe, fine-tune) - WSI Classification (MIL) advantages: - Works on IHC and special stains (not just H&E) - No contamination from TCGA/PAIP/GTEX stats: downloads_monthly: 17,240 likes: 147 paper: "A visual-language foundation model for computational pathology" (Nature Medicine 2024) ``` ### G.5 Interactive Segmentation 詳細資訊 #### Medical-SAM3 ```yaml name: Medical-SAM3 source: AIM Research Lab / ChongCong github: AIM-Research-Lab/Medical-SAM3 huggingface: ChongCong/Medical-SAM3 training_data: datasets: 33 medical imaging datasets modalities: 10 (CT, MRI, X-ray, Ultrasound, etc.) capabilities: - Point-prompt Segmentation - Box-prompt Segmentation - Text-prompt Segmentation (open-vocabulary) - Mask Refinement - Multi-object Tracking advantages: - Optimized for medical imaging - Better edge precision - Supports text descriptions paper: arXiv:2601.10880 ``` #### SAM 3 (Meta) ```yaml name: Segment Anything 3 (SAM 3) source: Meta AI / Facebook Research github: facebookresearch/sam3 huggingface: facebook/sam3 params: 848M license: Apache-2.0 requirements: python: ">=3.12" pytorch: ">=2.7" cuda: ">=12.6" training_data: images: 11M masks: 1.1B capabilities: - Point-prompt Segmentation - Box-prompt Segmentation - Text-prompt Segmentation - Video Segmentation (SAM-V) - Multi-mask Output stats: github_stars: 7,400 downloads_monthly: 1,650,000 ``` --- ## 附錄 H: 模型需求摘要 ### H.1 VRAM 需求 | 模型 | VRAM | 量化後 | 備註 | |:-----|:-----|:-------|:-----| | DenseNet | ~500MB | - | PyTorch | | PSPNet | ~500MB | - | PyTorch | | SAM 3 (ViT-H) | ~2GB | - | PyTorch | | Medical-SAM3 | ~2GB | - | PyTorch | | CheXagent-2-3B | ~6GB | ~2GB (AWQ) | vLLM | | LLaVA-Med-8B | ~16GB | ~4GB (GPTQ) | vLLM | | MAIRA-2 (7B) | ~14GB | ~3GB (8bit) | vLLM | | BiomedCLIP | ~1GB | - | PyTorch/OpenCLIP | | RAD-DINO | ~500MB | - | PyTorch | | TotalSegmentator | ~4GB | - | nnUNet | | Prov-GigaPath | ~2GB | - | timm | | UNI2-h | ~3GB | - | timm | | CONCH | ~1GB | - | OpenCLIP | ### H.2 建議配置 | 配置 | GPU | 可同時載入 | |:-----|:----|:-----------| | **最低** | RTX 3060 (12GB) | 1 VLM + SAM + Encoder | | **推薦** | RTX 4090 (24GB) | 2 VLM + SAM + 多 Encoder | | **生產** | A100 (80GB) | 全模型 + 高併發 | ### H.3 模型優先順序 ``` Phase 1 (MVP): ├── CheXagent-2-3b # CXR VQA & Report ├── Medical-SAM3 # Interactive Segmentation ├── RAD-DINO # Feature Extraction └── BiomedCLIP # Zero-shot Classification Phase 2 (擴展): ├── MAIRA-2 # Grounded Reports ├── LLaVA-Med # Multi-modality VQA ├── TotalSegmentator # CT/MRI Anatomy └── DenseNet/PSPNet # Legacy compatibility Phase 3 (病理): ├── Prov-GigaPath # WSI Encoding ├── CONCH # Zero-shot Pathology └── UNI2-h # High-accuracy Tiles ``` ### H.4 已驗證模型狀態 (2026-02 實測) > **測試環境**：Tesla V100-SXM2-32GB, Python 3.12, PyTorch 2.9, CUDA 12.8 | 模型 | 狀態 | 工具類別 | 說明 | |:-----|:-----|:---------|:-----| | **RAD-DINO** | ✅ Ready | Visual RAG | microsoft/rad-dino, 346MB, 768-dim embedding | | **FAISS** | ✅ Ready | Visual RAG | faiss-cpu, 向量檢索，L2 距離 | | **DenseNet-121** | ✅ Ready | 分類 | torchxrayvision，18 種 CXR 病理 | | **PSPNet** | ✅ Ready | 分割 | torchxrayvision，14 種器官分割 | | **DICOM Processor** | ✅ Ready | 影像處理 | pydicom，Window/Level 調整 | | **ViT-BERT Report Generator** | ✅ Ready | 報告生成 | IAMJB/radiology_report_generation，已下載權重 | | **Ollama** | ✅ Ready | LLM 服務 | v0.15.4，llava:7b 已下載，GPU 模式 | | **CheXagent-2-3b VQA** | ⚠️ 下載慢 | VQA | HuggingFace 下載速度不足，建議用 Ollama 替代 | | **LLaVA-Med-1.5-7B** | ❓ 未測試 | VQA | 需 vLLM 服務或直接載入 | | **MAIRA-2** | ❓ 未測試 | Grounding | Microsoft Phrase Grounding | | **Roentgen** | ❓ 未測試 | 生成 | 需本地 Roentgen 權重 | **Visual RAG 混合模式 (Mode B) 已驗證可用**： - RAD-DINO 編碼：~2秒/張 (GPU) - FAISS 檢索：<1ms (4 張 demo) - 建議用於快速原型開發 --- ## 附錄 I: MedVision MCP Agent 規格 ### I.1 Agent 概述 MedVision MCP Agent 是一個內建的醫療影像分析 Agent，設計為： - **對內**：使用 MCP Tools 執行分析任務 - **對外**：作為 MCP Tool 供外部 Agent 委託（A2A） ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ MedVision MCP Agent 架構 │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────────────────────────────────────────────────────────────┐ │ │ │ Agent Core (LangGraph ReAct) │ │ │ │ ┌───────────────┐ ┌───────────────┐ ┌───────────────────────┐ │ │ │ │ │ Planner │ │ Executor │ │ Memory │ │ │ │ │ │ ─────── │ │ ──────── │ │ ────── │ │ │ │ │ │ 任務分解 │ │ 工具調用 │ │ 對話歷史 │ │ │ │ │ │ 流程編排 │ │ 結果處理 │ │ 分析上下文 │ │ │ │ │ └───────────────┘ └───────────────┘ └───────────────────────┘ │ │ │ └────────────────────────────────┬────────────────────────────────────┘ │ │ │ Internal MCP Calls │ │ ▼ │ │ ┌─────────────────────────────────────────────────────────────────────┐ │ │ │ MCP Tools Invocation │ │ │ │ analyze_image | analyze_selected_region | ask_about_image | ... │ │ │ └─────────────────────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────────┘ ``` ### I.2 Agent 能力 | 能力 | 說明 | 對應 MCP Tools | |:-----|:-----|:---------------| | **影像理解** | 理解醫療影像內容和語義 | `analyze_image`, `ask_about_image` | | **異常偵測** | 自動識別並標註異常區域 | `analyze_image(detect=True)` | | **區域分析** | 深入分析用戶選定區域 | `analyze_selected_region` | | **報告生成** | 生成結構化分析報告 | `analyze_image(generate_report=True)` | | **互動分割** | 根據提示進行精確分割 | `analyze_selected_region(actions=["segment"])` | | **多輪對話** | 維持上下文的連續對話 | Agent Memory | | **流程編排** | 根據任務自動編排分析流程 | Planner | ### I.3 A2A (Agent-to-Agent) 介面外部 Agent（如 Claude、GPT）可以透過 `invoke_medical_agent` 工具委託 MedVision MCP Agent： ```python # 外部 Agent 的調用範例 result = await mcp_client.call_tool("invoke_medical_agent", { "session_id": "abc123", "task": "分析這張 CXR，特別注意用戶標記的右肺區域", "context": { "user_selection": { "type": "bbox", "coordinates": [100, 200, 300, 400] }, "clinical_history": "患者有咳嗽症狀" }, "mode": "auto" }) # 返回結果 { "status": "completed", "result": { "summary": "在右肺發現一個 2.3cm 的結節...", "findings": [...], "annotations": [...], "report": "..." }, "actions_taken": [ "analyze_image(classify=True)", "analyze_selected_region(actions=['describe', 'segment'])", "ask_about_image('這個結節的特徵是什麼？')" ], "follow_up_suggestions": [ "建議進行 CT 掃描進一步確認", "可以比較先前的影像觀察變化" ] } ``` ### I.4 與 Canvas UI 互動 Agent 可以主動與 Canvas UI 互動： ```python # Agent 在 Canvas 上高亮區域 await mcp_client.call_tool("push_to_canvas", { "session_id": "abc123", "action": "highlight", "payload": { "region": {"type": "bbox", "coordinates": [100, 200, 300, 400]}, "style": "pulse", "label": "可疑區域" } }) # Agent 請求用戶確認 result = await mcp_client.call_tool("request_user_input", { "session_id": "abc123", "input_type": "confirm", "prompt": "這個區域看起來像結節，是否需要詳細分析？" }) ``` ### I.5 Agent 配置 ```yaml # agent_config.yaml agent: # Agent 底層模型 llm: provider: "ollama" # or "vllm", "openai", "anthropic" model: "llama3:70b" # 主推理模型 fallback: "llama3:8b" # 備援模型 # 推理設定 inference: temperature: 0.1 max_tokens: 2048 retry_attempts: 3 # 記憶設定 memory: type: "buffer" # buffer, summary, vector max_turns: 20 persist: true # 可用工具 tools: enabled: - "analyze_image" - "analyze_selected_region" - "ask_about_image" - "add_annotation" - "push_to_canvas" - "request_user_input" # 行為設定 behavior: auto_explain: true # 自動解釋分析結果 suggest_next_steps: true # 建議後續動作 interactive_mode: true # 需要時詢問用戶 ``` ### I.6 互動診斷流程設計 #### I.6.1 雙向互動診斷循環 MedVision MCP 的核心互動模式是一個**雙向互動診斷循環 (Diagnosis Loop)**： ``` ┌─────────────────────────────────────────────────────────────────────────┐ │ 互動診斷循環 (Diagnosis Loop) │ ├─────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────┐ ┌─────────────┐ │ │ │ Canvas │◄────── push_to_canvas ───────│ Agent │ │ │ │ (畫板) │ │ (診斷) │ │ │ │ │ │ │ │ │ │ ┌───────┐ │ user_region_select │ ┌───────┐ │ │ │ │ │影像 │ │────────────────────────────►│ │VLM │ │ │ │ │ └───────┘ │ │ │Agent │ │ │ │ │ ┌───────┐ │ agent_highlight │ └───────┘ │ │ │ │ │標記 │◄─┼──────────────────────────────┤ │ │ │ │ └───────┘ │ │ ┌───────┐ │ │ │ │ ┌───────┐ │ │ │Tools │ │ │ │ │ │用戶畫 │ │ ask_about_region │ └───────┘ │ │ │ │ └───────┘ │────────────────────────────►│ │ │ │ └─────────────┘ └─────────────┘ │ │ ▲ │ │ │ │ ┌─────────────┐ │ │ │ └──────────────│ 對話框/Chat │◄──────────────┘ │ │ └─────────────┘ │ └─────────────────────────────────────────────────────────────────────────┘ ``` #### I.6.2 互動步驟詳解 | 步驟 | 發起者 | 動作 | 技術實現 | |:-----|:-------|:-----|:---------| | 1 | User | 載入影像到 Canvas | `add_image_to_session` | | 2 | User | 在對話框問診斷 | `invoke_medical_agent` / handoff | | 3 | Agent | 回答診斷結果 | VLM inference | | 4 | User | 要求標記出發現位置 | `request: "highlight findings"` | | 5 | Agent | **主動**在 Canvas 標記 | `push_to_canvas` | | 6 | User | **在 Canvas 圈選區域** 問 "這裡呢？" | `user_region_select` → MCP call | | 7 | Agent | 分析用戶選區 + **主動標出相關區域** | `analyze_selected_region` + `push_to_canvas` | #### I.6.3 完整流程範例 ``` ┌─────────────────────────────────────────────────────────────────────────┐ │ 互動診斷流程 (完整範例) │ ├─────────────────────────────────────────────────────────────────────────┤ │ │ │ Step 1: 載入影像 │ │ ┌─────────────┐ │ │ │ User 拉入 │ ──► add_image_to_session ──► Canvas 顯示影像 │ │ │ CXR 影像 │ │ │ └─────────────┘ │ │ │ │ Step 2: 問診斷 │ │ ┌─────────────┐ │ │ │ User 輸入 │ ──► invoke_medical_agent({ │ │ │ "幫我診斷" │ task: "診斷這張影像", │ │ └─────────────┘ config: {auto_suggest: true} │ │ }) │ │ │ │ Step 3: Agent 回答 + 自動標記 │ │ ┌─────────────┐ ┌─────────────────────────────────────────────┐ │ │ │ Agent: │ │ push_to_canvas({ │ │ │ │ "右下肺有 │ ──► │ annotations: [ │ │ │ │ 結節" │ │ {type: "bbox", coords: [...], label: ..}│ │ │ └─────────────┘ │ ], │ │ │ │ │ related_regions: [ │ │ │ ▼ │ {coords: [...], note: "建議也檢查這裡"} │ │ │ ┌─────────────┐ │ ] │ │ │ │ Canvas 顯示 │ │ }) │ │ │ │ 標記+建議 │ └─────────────────────────────────────────────┘ │ │ └─────────────┘ │ │ │ │ Step 4: User 在 Canvas 畫區域問 "這裡呢？" │ │ ┌─────────────┐ │ │ │ User Draws │ ──► analyze_selected_region({ │ │ │ Freehand │ region: {type: "polygon", points: [...]}, │ │ │ "這裡呢？" │ question: "這裡呢？", │ │ └─────────────┘ actions: ["describe", "segment", "suggest"] │ │ }) │ │ │ │ Step 5: Agent 回答 + 標出相關區域 │ │ ┌─────────────┐ ┌─────────────────────────────────────────────┐ │ │ │ Agent: │ │ Response includes: │ │ │ │ "這區域是 │ ──► │ answer: "這是肺紋理增加區域..." │ │ │ │ 肺紋理... │ │ annotations: [ │ │ │ │ 另外這區 │ │ {type:"mask", data:[...], label:"用戶"}│ │ │ │ 域也有..."│ │ ], │ │ │ └─────────────┘ │ related_suggestions: [ │ │ │ │ {coords:[...], note:"這裡可能也有浸潤"} │ │ │ │ ] │ │ │ └─────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────┘ ``` #### I.6.4 主動建議配置 Agent 主動標記「這裡可能也有問題」的觸發時機可配置： ```yaml # agent_config.yaml behavior: auto_suggest: enabled: true # 可關閉 triggers: - on_analysis_complete # 分析完成自動建議 - on_user_region_question # 用戶問區域時順便建議 - on_high_confidence_finding # 高信心發現時建議 min_confidence: 0.7 # 信心度門檻 max_suggestions: 3 # 最多建議數量 ``` --- ### I.7 UI 架構 (共用組件策略) Canvas UI 採用**共用 React 組件庫**策略，支援多平台部署： ``` ┌─────────────────────────────────────────────────────────────────────────┐ │ UI 架構 (共用組件庫) │ ├─────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────────────────────────────────────────────────────────┐ │ │ │ @medvision-mcp/canvas (npm package) │ │ │ │ ┌───────────┐ ┌──────────────┐ ┌────────────┐ ┌───────────┐ │ │ │ │ │ ImageView │ │ AnnotateTool │ │ ZoomPanCtl │ │ LayerMgr │ │ │ │ │ └───────────┘ └──────────────┘ └────────────┘ └───────────┘ │ │ │ │ React + Fabric.js + TypeScript │ │ │ └──────────────────────────────┬──────────────────────────────────┘ │ │ │ │ │ ┌───────────────────────┼───────────────────────┐ │ │ ▼ ▼ ▼ │ │ ┌──────────────┐ ┌──────────────────┐ ┌────────────────┐ │ │ │ VS Code Ext │ │ Standalone Web │ │ Electron (opt) │ │ │ │ (WebView) │ │ (Vite/Next.js) │ │ (future) │ │ │ │ │ │ │ │ │ │ │ │ 開發/測試用 │ │ 最終產品部署 │ │ 離線部署選項 │ │ │ │ Copilot整合 │ │ │ │ │ │ │ └──────────────┘ └──────────────────┘ └────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────┘ ``` #### I.7.1 組件庫結構 ``` @medvision-mcp/canvas/ ├── package.json ├── src/ │ ├── index.ts # 公開 API │ ├── components/ │ │ ├── MedicalCanvas.tsx # 主畫布組件 │ │ ├── ImageViewer.tsx # DICOM/PNG 顯示 │ │ ├── AnnotationLayer.tsx # 標記圖層 │ │ ├── ToolPalette.tsx # 工具選板 │ │ ├── LayerPanel.tsx # 圖層管理 │ │ └── MiniMap.tsx # 縮圖導航 │ ├── hooks/ │ │ ├── useMCP.ts # MCP 通訊 hook │ │ ├── useCanvas.ts # Canvas 操作 hook │ │ ├── useAnnotations.ts # 標記管理 hook │ │ └── useSession.ts # Session 狀態 hook │ ├── types/ │ │ └── annotations.ts # TypeScript 定義 │ └── utils/ │ ├── fabricHelpers.ts # Fabric.js 工具 │ └── dicomLoader.ts # DICOM 載入 ├── dist/ # 打包輸出 └── README.md ``` --- ### I.8 Canvas 標記類型定義 #### I.8.1 支援的標記類型 | 類型 | 用途 | 用戶可畫 | Agent 可生成 | |:-----|:-----|:---------|:-------------| | `bbox` | 矩形框選 | ✅ | ✅ | | `polygon` | 多邊形輪廓 | ✅ | ✅ | | `freehand` | 手繪線條 | ✅ | ❌ | | `mask` | 分割遮罩 (SAM3) | ❌ | ✅ | | `point` | 點標記 | ✅ | ✅ | | `text` | 文字標註 | ✅ | ✅ | #### I.8.2 TypeScript 定義 ```typescript // @medvision-mcp/canvas/src/types/annotations.ts type AnnotationType = | 'bbox' // 矩形框 | 'polygon' // 多邊形輪廓 | 'freehand' // 手繪線條 | 'mask' // 分割遮罩 (from SAM3) | 'point' // 點標記 | 'text'; // 文字標註 type AnnotationSource = 'user' | 'agent'; interface AnnotationStyle { color: string; // e.g., "#FF0000" opacity: number; // 0-1 strokeWidth: number; // pixels fill: boolean; // 是否填充 lineDash?: number[]; // 虛線樣式 } interface Annotation { id: string; type: AnnotationType; source: AnnotationSource; timestamp: string; // 幾何資料 (根據 type) coordinates?: [number, number, number, number]; // bbox: [x1, y1, x2, y2] points?: [number, number][]; // polygon/freehand/point mask?: { data: string; // base64 encoded binary mask width: number; height: number; }; textContent?: string; // text 類型的文字內容 textPosition?: [number, number]; // text 位置 // 標註資訊 label?: string; description?: string; confidence?: number; // Agent 標記的信心度 (0-1) finding?: string; // 對應的臨床發現 // 顯示樣式 style: AnnotationStyle; // 互動狀態 visible: boolean; locked: boolean; interactive: boolean; // 點擊可展開詳情 // 關聯 relatedAnnotationIds?: string[]; // 關聯的其他標記 suggestedBy?: string; // 由誰建議 (agent/user annotation id) } interface CanvasState { sessionId: string; imageId: string; annotations: Annotation[]; selectedAnnotationId?: string; zoom: number; pan: { x: number; y: number }; activeTool: 'select' | 'bbox' | 'polygon' | 'freehand' | 'point' | 'text'; } ``` #### I.8.3 Agent 推送標記範例 ```python # Agent 分析完成後推送標記到 Canvas await mcp_client.call_tool("push_to_canvas", { "session_id": "abc123", "action": "add_annotations", "payload": { "annotations": [ { "id": "agent-finding-001", "type": "bbox", "source": "agent", "coordinates": [120, 80, 200, 160], "label": "右肺結節", "description": "約 1.2cm 的實質性結節，邊緣清楚", "confidence": 0.92, "finding": "nodule", "style": { "color": "#FF6B6B", "opacity": 0.8, "strokeWidth": 2, "fill": False }, "interactive": True }, { "id": "agent-mask-001", "type": "mask", "source": "agent", "mask": { "data": "base64...", # SAM3 輸出 "width": 512, "height": 512 }, "label": "結節精確輪廓", "style": { "color": "#FF6B6B", "opacity": 0.4, "strokeWidth": 0, "fill": True } } ], "related_suggestions": [ { "region": {"type": "bbox", "coordinates": [300, 200, 380, 280]}, "note": "建議也檢查這個區域，可能有相似病變", "style": { "color": "#FFE66D", "opacity": 0.6, "strokeWidth": 2, "lineDash": [5, 5] } } ] } }) ``` --- ### I.9 A2A vs 純 MCP 雙模式 MedVision MCP 同時支援兩種調用模式： ``` ┌─────────────────────────────────────────────────────────────────────────┐ │ 雙模式架構 │ ├─────────────────────────────────────────────────────────────────────────┤ │ │ │ 外部 Agent (Claude Opus / GPT) │ │ │ │ │ ├─────────── 模式 A: 純 MCP ──────────────────────┐ │ │ │ 直接調用工具 ▼ │ │ │ ┌─────────────────────────────────────────────┐ │ │ │ │ MCP Server (medvision-mcp) │ │ │ │ │ ┌─────────┬─────────┬─────────────────┐ │ │ │ │ │ │analyze_ │ask_ │analyze_selected_│ │ │ │ │ │ │image │about_ │region │ │ │ │ ▼ │ │ │image │ │ │ │ │ ┌─────────────►│◄───┴─────────┴─────────┴─────────────────┘ │ │ │ │ └─────────────────────────────────────────────┘ │ │ │ │ │ │ 模式 B: A2A (委託) │ │ │ ┌─────────────────────────────────────────────┐ │ │ │ │ invoke_medical_agent │ │ │ └──────────────► ┌────────────────────────────────────┐ │ │ │ │ │ Internal Agent (LangGraph) │ │ │ │ │ │ - 任務分解 │ │ │ │ │ │ - 呼叫多 tools │ │ │ │ │ │ - 對話記憶 │ │ │ │ │ │ - 主動建議 │ │ │ │ │ └────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────┘ ``` #### I.9.1 模式選擇指引 | 場景 | 推薦模式 | 原因 | |:-----|:---------|:-----| | 單一影像快速分類 | 純 MCP | 外部 Agent 直接調用 `analyze_image` | | 單一問答 | 純 MCP | 直接調用 `ask_about_image` | | 完整診斷流程 | A2A | 需要多工具編排和上下文記憶 | | 互動式區域探索 | A2A | 需要記住先前的發現和建議 | | 批量處理多張影像 | 純 MCP | 外部 Agent 自行編排迴圈 | | 與 Canvas 深度互動 | A2A | 需要主動推送標記和建議 | #### I.9.2 handoff 機制外部 Agent（如 Claude）可透過 handoff 將控制權轉移給內建 Agent： ```python # Claude 發起 handoff result = await mcp_client.call_tool("invoke_medical_agent", { "session_id": "abc123", "task": "完整分析這張影像，並與用戶互動確認發現", "mode": "interactive", # 啟用互動模式 "handoff": { "return_on": ["user_confirmation", "analysis_complete"], "max_turns": 10, "preserve_context": True } }) # Internal Agent 接管後： # 1. 執行分析 # 2. 推送標記到 Canvas # 3. 等待用戶回應 # 4. 繼續對話 # 5. 完成後返回控制權給 Claude ``` --- ### I.10 Visual RAG × Canvas 整合 > **核心機制**：用戶在 Canvas 圈選區域 → 觸發 Visual RAG → 返回相似案例 → 外部 Agent 綜合判斷 → 推送標記回 Canvas #### I.10.1 完整資料流 ``` ┌───────────────────────────────────────────────────────────────────────────────────┐ │ Visual RAG × Canvas 完整資料流 │ ├───────────────────────────────────────────────────────────────────────────────────┤ │ │ │ ① User 圈選區域 + 問問題 │ │ ┌────────────────────────┐ │ │ │ Canvas UI │ 用戶用 freehand/bbox 圈一塊區域 │ │ │ ┌────┬──────┐ │ 輸入: "這裡是什麼？" │ │ │ │ │▓▓▓▓ │ │ │ │ │ │ │▓▓▓▓ │ │ │ │ │ └────┴──────┘ │ │ │ └────────────┬───────────┘ │ │ │ user_region_select │ │ ▼ │ │ ┌────────────────────────────────────────────────────────────────────────────┐ │ │ │ ② MCP Tool Call: analyze_selected_region │ │ │ │ { │ │ │ │ session_id: "abc123", │ │ │ │ region: {type: "freehand", points: [[100,150], [120,180], ...]}, │ │ │ │ question: "這裡是什麼？", │ │ │ │ actions: ["describe", "rag_search", "classify"] ← 觸發三種分析 │ │ │ │ } │ │ │ └────────────────────────────────────┬───────────────────────────────────────┘ │ │ │ │ │ ┌────────────────────────────────────▼───────────────────────────────────────┐ │ │ │ ③ MCP Server 內部處理流程 │ │ │ │ │ │ │ │ ┌─────────────────┐ │ │ │ │ │ Region Cropper │ 依據 region 座標從原影像擷取子區域 │ │ │ │ │ │ 輸出: cropped_image (224x224 標準化) │ │ │ │ └────────┬────────┘ │ │ │ │ │ │ │ │ │ ┌────────▼────────┐ ┌──────────────────┐ ┌────────────────────────────┐│ │ │ │ │ RAD-DINO │ │ FAISS Index │ │ Reference Database ││ │ │ │ │ ──────── │ │ ─────────── │ │ ────────────────── ││ │ │ │ │ Encode Image │─►│ Search Top-K │─►│ case_id → Report ││ │ │ │ │ → [768-dim] │ │ (L2 distance) │ │ case_id → Diagnosis ││ │ │ │ │ ~2秒/張 (GPU) │ │ <1ms │ │ case_id → Annotations ││ │ │ │ └─────────────────┘ └──────────────────┘ └────────────────────────────┘│ │ │ │ │ │ │ │ │ ┌────────▼────────┐ │ │ │ │ │ DenseNet-121 │ 快速分類 18 種 CXR 病理 │ │ │ │ │ ───────────── │ 輸出: {infiltration: 0.85, nodule: 0.32, ...} │ │ │ │ │ ~0.1秒 (GPU) │ │ │ │ │ └────────┬────────┘ │ │ │ │ │ │ │ │ │ ┌────────▼─────────────────────────────────────────────────────────────┐ │ │ │ │ │ RAG Context Builder │ │ │ │ │ │ 合併: classification + similar_cases + region_metadata │ │ │ │ │ └──────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ │ │ └────────────────────────────────────┬───────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌────────────────────────────────────────────────────────────────────────────┐ │ │ │ ④ MCP Response (返回給外部 Agent) │ │ │ │ { │ │ │ │ region_info: { │ │ │ │ bounding_box: [100, 150, 220, 280], │ │ │ │ pixel_stats: {mean: 128.5, std: 42.3}, │ │ │ │ anatomical_region: "right_lower_lung" │ │ │ │ }, │ │ │ │ classification: { │ │ │ │ infiltration: 0.85, │ │ │ │ nodule: 0.32, │ │ │ │ consolidation: 0.28, │ │ │ │ ... │ │ │ │ }, │ │ │ │ similar_cases: [ │ │ │ │ { │ │ │ │ case_id: "mimic-p10032546", │ │ │ │ similarity: 0.95, │ │ │ │ report: "Findings: Right lower lobe infiltrate...", │ │ │ │ diagnosis: "Community-acquired pneumonia" │ │ │ │ }, │ │ │ │ { │ │ │ │ case_id: "eurorad-case-4521", │ │ │ │ similarity: 0.89, │ │ │ │ report: "Bilateral ground glass opacities...", │ │ │ │ diagnosis: "Viral pneumonia" │ │ │ │ } │ │ │ │ ], │ │ │ │ confidence_summary: "高機率肺部浸潤 (DenseNet: 85%, RAG Top-1: 95%)" │ │ │ │ } │ │ │ └────────────────────────────────────┬───────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌────────────────────────────────────────────────────────────────────────────┐ │ │ │ ⑤ External Agent (Claude/GPT) 綜合判斷 │ │ │ │ │ │ │ │ 輸入: MCP Response + 用戶問題 "這裡是什麼？" │ │ │ │ │ │ │ │ Agent 推理: │ │ │ │ - DenseNet 分類 infiltration = 0.85 (高) │ │ │ │ - RAG 相似案例 #1: pneumonia (similarity 0.95) │ │ │ │ - RAG 相似案例 #2: viral pneumonia (similarity 0.89) │ │ │ │ - 區域位置: right_lower_lung │ │ │ │ │ │ │ │ 生成回答: "這個區域位於右下肺野，呈現明顯的浸潤陰影。 │ │ │ │ 根據影像特徵和相似案例分析，高度懷疑為肺炎。 │ │ │ │ 建議結合臨床症狀進一步確認..." │ │ │ │ │ │ │ │ 決定標記: 需要在 Canvas 上標示發現區域 │ │ │ │ │ │ │ └────────────────────────────────────┬───────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌────────────────────────────────────────────────────────────────────────────┐ │ │ │ ⑥ push_to_canvas (Agent 推送標記) │ │ │ │ { │ │ │ │ session_id: "abc123", │ │ │ │ action: "add_annotations", │ │ │ │ payload: { │ │ │ │ annotations: [ │ │ │ │ { │ │ │ │ id: "agent-finding-001", │ │ │ │ type: "bbox", │ │ │ │ source: "agent", │ │ │ │ coordinates: [100, 150, 220, 280], │ │ │ │ label: "肺部浸潤", │ │ │ │ description: "右下肺野浸潤性陰影，疑似肺炎", │ │ │ │ confidence: 0.85, │ │ │ │ style: {color: "#FF6B6B", opacity: 0.8, strokeWidth: 2} │ │ │ │ } │ │ │ │ ], │ │ │ │ message: "這個區域位於右下肺野，呈現明顯的浸潤陰影...", │ │ │ │ related_suggestions: [ │ │ │ │ { │ │ │ │ region: {type: "bbox", coordinates: [50, 100, 150, 200]}, │ │ │ │ note: "相似案例中此區域也常有病變，建議一併檢查", │ │ │ │ style: {color: "#FFE66D", lineDash: [5, 5]} │ │ │ │ } │ │ │ │ ] │ │ │ │ } │ │ │ │ } │ │ │ └────────────────────────────────────┬───────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌────────────────────────────────────────────────────────────────────────────┐ │ │ │ ⑦ Canvas UI 更新 │ │ │ │ │ │ │ │ ┌────────────────────────────────┐ ┌────────────────────┐ │ │ │ │ │ Canvas 顯示 │ │ 對話框 │ │ │ │ │ │ ┌────┬──────┐ │ │ │ │ │ │ │ │ │ │▓▓▓▓ │← Agent 標記 │ │ Agent: 這個區域 │ │ │ │ │ │ │ │▓▓▓▓ │ (紅色實線) │ │ 位於右下肺野... │ │ │ │ │ │ │ ┌──┐ │← 建議區域 │ │ │ │ │ │ │ │ │ │ │ │ (黃色虛線) │ │ User: OK, 那這邊？ │ │ │ │ │ │ │ └──┘ │ │ │ │ │ │ │ │ │ └────┴──────┘ │ └────────────────────┘ │ │ │ │ └────────────────────────────────┘ │ │ │ │ │ │ │ │ 循環繼續: User 可以繼續圈選 → 觸發新的 RAG 查詢... │ │ │ │ │ │ │ └────────────────────────────────────────────────────────────────────────────┘ │ │ │ └───────────────────────────────────────────────────────────────────────────────────┘ ``` #### I.10.2 各模組責任劃分 | 模組 | 責任 | 輸入 | 輸出 | |:-----|:-----|:-----|:-----| | **Canvas UI** | 用戶互動介面 | 用戶繪圖/問題 | region + question | | **MCP Server** | 封裝 AI 模型 | region + question | RAG context | | **RAD-DINO** | 影像編碼 | cropped_image | 768-dim embedding | | **FAISS** | 向量檢索 | query_embedding | top-k similar | | **DenseNet** | 快速分類 | image | 18 病理機率 | | **Reference DB** | 儲存案例 | case_id | report + diagnosis | | **External Agent** | 綜合判斷+生成 | RAG context | 回答 + 標記指令 | #### I.10.3 MCP Server 內部實作架構 \`\`\`python # src/medvision_mcp/tools/visual_rag.py from dataclasses import dataclass from typing import List, Optional import torch import faiss import numpy as np @dataclass class RAGResult: case_id: str similarity: float report: str diagnosis: Optional[str] annotations: Optional[List[dict]] class VisualRAGEngine: """Visual RAG 核心引擎""" def __init__( self, encoder_model: str = "microsoft/rad-dino", index_path: str = "data/faiss_index.bin", db_path: str = "data/reference_cases.db" ): # 載入 RAD-DINO 編碼器 self.encoder = AutoModel.from_pretrained(encoder_model) self.processor = AutoImageProcessor.from_pretrained(encoder_model) # 載入 FAISS 索引 self.index = faiss.read_index(index_path) # 連接參考資料庫 self.db = ReferenceDatabase(db_path) def encode_region(self, image: np.ndarray, region: dict) -> np.ndarray: """擷取並編碼區域""" # 1. Crop region from image cropped = self._crop_region(image, region) # 2. Preprocess inputs = self.processor(images=cropped, return_tensors="pt") # 3. Encode with torch.no_grad(): outputs = self.encoder(**inputs) embedding = outputs.last_hidden_state[:, 0] # CLS token return embedding.numpy() def search_similar( self, embedding: np.ndarray, top_k: int = 5 ) -> List[RAGResult]: """搜尋相似案例""" # FAISS 搜尋 distances, indices = self.index.search(embedding, top_k) # 從資料庫取得詳細資訊 results = [] for i, idx in enumerate(indices[0]): case = self.db.get_case(idx) results.append(RAGResult( case_id=case.id, similarity=1.0 / (1.0 + distances[0][i]), report=case.report, diagnosis=case.diagnosis, annotations=case.annotations )) return results # MCP Tool 實作 @mcp_tool() async def analyze_selected_region( session_id: str, region: dict, question: str, actions: List[str] = ["describe", "rag_search", "classify"] ) -> dict: """ 分析用戶在 Canvas 上選取的區域。 Args: session_id: 會話 ID region: 區域定義 (type, coordinates/points) question: 用戶問題 actions: 要執行的動作列表 - "describe": 描述區域基本資訊 - "rag_search": 執行 Visual RAG 搜尋 - "classify": 執行 DenseNet 分類 Returns: 包含 region_info, classification, similar_cases 的字典 """ session = await get_session(session_id) image = session.current_image result = {} # 1. 區域基本資訊 if "describe" in actions: result["region_info"] = describe_region(image, region) # 2. Visual RAG 搜尋 if "rag_search" in actions: embedding = rag_engine.encode_region(image, region) similar = rag_engine.search_similar(embedding, top_k=5) result["similar_cases"] = [ { "case_id": s.case_id, "similarity": s.similarity, "report": s.report, "diagnosis": s.diagnosis } for s in similar ] # 3. DenseNet 分類 if "classify" in actions: cropped = crop_region(image, region) result["classification"] = densenet_classify(cropped) # 4. 組合信心度摘要 result["confidence_summary"] = build_confidence_summary(result) return result \`\`\` #### I.10.4 Canvas UI 整合 (React Hook) \`\`\`typescript // @medvision-mcp/canvas/src/hooks/useVisualRAG.ts import { useMCP } from './useMCP'; import { useCanvas } from './useCanvas'; import { Annotation, Region } from '../types'; interface RAGSearchResult { region_info: { bounding_box: [number, number, number, number]; pixel_stats: { mean: number; std: number }; anatomical_region: string; }; classification: Record<string, number>; similar_cases: Array<{ case_id: string; similarity: number; report: string; diagnosis?: string; }>; confidence_summary: string; } export function useVisualRAG() { const { callTool } = useMCP(); const { sessionId, addAnnotations, showMessage } = useCanvas(); /** * 用戶圈選區域後觸發 Visual RAG 分析 */ const analyzeRegion = async ( region: Region, question: string ): Promise<void> => { // 1. 呼叫 MCP Tool const result = await callTool<RAGSearchResult>('analyze_selected_region', { session_id: sessionId, region, question, actions: ['describe', 'rag_search', 'classify'], }); // 2. 顯示處理中狀態 showMessage({ type: 'processing', text: '正在分析選取區域...', }); // 3. 結果會由 External Agent 透過 push_to_canvas 推送 }; /** * 接收 Agent 推送的標記 */ const handleAgentPush = (payload: { annotations: Annotation[]; message: string; related_suggestions?: Array<{ region: Region; note: string; }>; }) => { // 1. 添加 Agent 標記 addAnnotations(payload.annotations); // 2. 顯示 Agent 訊息 showMessage({ type: 'agent', text: payload.message, }); // 3. 顯示建議區域 (虛線) if (payload.related_suggestions) { const suggestions = payload.related_suggestions.map((s, idx) => ({ id: \`suggestion-\${idx}\`, type: 'bbox' as const, source: 'agent' as const, coordinates: s.region.coordinates, label: s.note, style: { color: '#FFE66D', lineDash: [5, 5], opacity: 0.6, }, })); addAnnotations(suggestions); } }; return { analyzeRegion, handleAgentPush, }; } \`\`\` #### I.10.5 時序圖 \`\`\` ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ User │ │ Canvas │ │ MCP │ │ External │ │ Canvas │ │ │ │ UI │ │ Server │ │ Agent │ │ UI │ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ │ │ │ │ │ Draw region │ │ │ │ │ + "這裡呢？" │ │ │ │ │───────────────►│ │ │ │ │ │ │ │ │ │ │ analyze_selected_region │ │ │ │───────────────►│ │ │ │ │ │ │ │ │ │ │───┐ Crop │ │ │ │ │ │ Encode │ │ │ │ │ │ Search │ │ │ │ │ │ Classify │ │ │ │ │◄──┘ │ │ │ │ │ │ │ │ │ RAG Context │ │ │ │ │◄───────────────│ │ │ │ │ │ │ │ │ │ │ Forward to External Agent │ │ │ │───────────────►│ │ │ │ │ │ │ │ │ │ │───┐ LLM │ │ │ │ │ │ Reasoning │ │ │ │ │◄──┘ │ │ │ │ │ │ │ │ │ push_to_canvas │ │ │ │ │◄───────────────│ │ │ │ │ │ │ │ │ Add annotations + message │ │ │ │◄───────────────│ │ │ │ │ │ │ │ │ See result │ │ │ │ │◄───────────────│ │ │ │ │ │ │ │ │ \`\`\` #### I.10.6 配置選項 \`\`\`yaml # config/visual_rag.yaml visual_rag: # 編碼器設定 encoder: model: "microsoft/rad-dino" device: "cuda" # or "cpu" batch_size: 4 # FAISS 索引設定 index: path: "data/faiss_index.bin" dimension: 768 metric: "L2" # or "IP" (inner product) # 檢索設定 retrieval: top_k: 5 min_similarity: 0.5 # 過濾低相似度結果 rerank: false # 是否使用 reranker # 分類器設定 classifier: model: "densenet121-res224-chex" threshold: 0.5 # 病理陽性閾值 # 參考資料庫 reference_db: type: "sqlite" path: "data/reference_cases.db" sources: - name: "MIMIC-CXR" path: "/data/mimic-cxr/" reports: true labels: true - name: "CheXpert" path: "/data/chexpert/" reports: false labels: true - name: "EURORAD" path: "data/eurorad_metadata.json" reports: true labels: true # Canvas 整合設定 canvas_integration: region_analysis: actions: - "describe" - "rag_search" - "classify" auto_suggest: true max_suggestions: 3 annotation_styles: agent_finding: color: "#FF6B6B" opacity: 0.8 strokeWidth: 2 agent_suggestion: color: "#FFE66D" opacity: 0.6 lineDash: [5, 5] user_selection: color: "#4ECDC4" opacity: 0.7 \`\`\` #### I.10.7 Reference Database Schema \`\`\`sql -- data/reference_cases.db CREATE TABLE cases ( id INTEGER PRIMARY KEY, case_id TEXT UNIQUE NOT NULL, source TEXT NOT NULL, -- 'MIMIC-CXR', 'CheXpert', 'EURORAD' image_path TEXT, embedding BLOB, -- 768-dim float32 (3072 bytes) created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); CREATE TABLE reports ( id INTEGER PRIMARY KEY, case_id TEXT REFERENCES cases(case_id), section TEXT, -- 'findings', 'impression', 'full' text TEXT NOT NULL ); CREATE TABLE diagnoses ( id INTEGER PRIMARY KEY, case_id TEXT REFERENCES cases(case_id), diagnosis TEXT NOT NULL, confidence REAL ); CREATE TABLE labels ( id INTEGER PRIMARY KEY, case_id TEXT REFERENCES cases(case_id), pathology TEXT NOT NULL, -- 'infiltration', 'nodule', etc. value INTEGER -- -1: uncertain, 0: negative, 1: positive ); CREATE INDEX idx_cases_source ON cases(source); CREATE INDEX idx_labels_pathology ON labels(pathology); \`\`\` ---

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/u9401066/medvision-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

spec.md•163 KiB