get_pdf_layout_text
Extract text and bounding rectangles from a specified PDF page, returning each line with coordinates for direct use in annotation tools.
Instructions
提取 PDF 指定页面的文本及物理坐标。
返回 JSON,每个文本行包含 text 和 rect [x0, y0, x1, y1](Zotero PDF 坐标系)。 可以直接将 rect 传给 create_pdf_annotation 使用。
Args: item_id: Zotero PDF 附件的 itemID(数字),或 PDF 文件的绝对路径 page_number: 页码(从 0 开始)
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| item_id | Yes | ||
| page_number | Yes |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |
Implementation Reference
- annota/pdf_tools.py:44-112 (handler)Core implementation: uses PyMuPDF (fitz) to extract text lines and their bounding boxes from a PDF page, converts coordinates from PyMuPDF space (top-left origin) to Zotero PDF user space (bottom-left origin), and returns structured JSON with page dimensions and text blocks.
def extract_page_text( pdf_path: str | Path, page_number: int, ) -> dict: """提取指定 PDF 页面的文本块及其 Zotero 空间坐标。 Returns: { "page_number": int, "page_width": float, "page_height": float, "blocks": [ {"text": str, "rect": [x0, y0, x1, y1]}, ... ] } """ doc = fitz.open(str(pdf_path)) try: if page_number < 0 or page_number >= len(doc): raise ValueError( f"page_number {page_number} 超出范围," f"该 PDF 共 {len(doc)} 页 (0-indexed)" ) page = doc[page_number] page_height = page.rect.height page_width = page.rect.width # 使用 dict 模式获取结构化文本(blocks → lines → spans) text_dict = page.get_text("dict", flags=fitz.TEXT_PRESERVE_WHITESPACE) blocks = [] for block in text_dict["blocks"]: if block["type"] != 0: # 只处理文本块,跳过图片块 continue for line in block["lines"]: # 合并同一行所有 span 的文本 line_text = "" for span in line["spans"]: line_text += span["text"] line_text = line_text.strip() if not line_text: continue # 用行的整体 bbox 作为坐标 line_rect = line["bbox"] # (x0, y0, x1, y1) PyMuPDF 坐标 zotero_rect = pymupdf_rect_to_zotero(line_rect, page_height) blocks.append({ "text": line_text, "rect": zotero_rect, }) logger.info( "提取页面 %d: %d 个文本行, 页面尺寸 %.1f x %.1f", page_number, len(blocks), page_width, page_height, ) return { "page_number": page_number, "page_width": round(page_width, 3), "page_height": round(page_height, 3), "blocks": blocks, } finally: doc.close() - annota/pdf_tools.py:20-33 (helper)Coordinate conversion helper: transforms PyMuPDF rects (top-left origin, y-down) to Zotero rects (bottom-left origin, y-up) by flipping the y-axis using page_height.
def pymupdf_rect_to_zotero(rect: tuple[float, ...], page_height: float) -> list[float]: """PyMuPDF rect (左上角原点, y↓) → Zotero rect (左下角原点, y↑)。 PyMuPDF: (x0, y0_top, x1, y1_top) y0 < y1, y 从上往下 Zotero: [x0, y0_bot, x1, y1_bot] y0 < y1, y 从下往上 转换: zotero_y0 = H - pymupdf_y1, zotero_y1 = H - pymupdf_y0 """ x0, y0, x1, y1 = rect[:4] return [ round(x0, 3), round(page_height - y1, 3), round(x1, 3), round(page_height - y0, 3), ] - annota/server.py:63-77 (handler)MCP tool handler: resolves the item_id to a PDF path, delegates to pdf_tools.extract_page_text(), and returns the result as a JSON string.
@mcp.tool() def get_pdf_layout_text(item_id: str, page_number: int) -> str: """提取 PDF 指定页面的文本及物理坐标。 返回 JSON,每个文本行包含 text 和 rect [x0, y0, x1, y1](Zotero PDF 坐标系)。 可以直接将 rect 传给 create_pdf_annotation 使用。 Args: item_id: Zotero PDF 附件的 itemID(数字),或 PDF 文件的绝对路径 page_number: 页码(从 0 开始) """ pdf_path = _resolve_pdf_path(item_id) result = pdf_tools.extract_page_text(pdf_path, page_number) return json.dumps(result, ensure_ascii=False, indent=2) - annota/server.py:63-64 (registration)Registration via FastMCP's @mcp.tool() decorator, which registers get_pdf_layout_text as an MCP tool in the 'annota' server.
@mcp.tool() def get_pdf_layout_text(item_id: str, page_number: int) -> str: