Skip to main content
Glama

get_pdf_layout_text

Extract text and bounding rectangles from a specified PDF page, returning each line with coordinates for direct use in annotation tools.

Instructions

提取 PDF 指定页面的文本及物理坐标。

返回 JSON,每个文本行包含 text 和 rect [x0, y0, x1, y1](Zotero PDF 坐标系)。 可以直接将 rect 传给 create_pdf_annotation 使用。

Args: item_id: Zotero PDF 附件的 itemID(数字),或 PDF 文件的绝对路径 page_number: 页码(从 0 开始)

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
item_idYes
page_numberYes

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes

Implementation Reference

  • Core implementation: uses PyMuPDF (fitz) to extract text lines and their bounding boxes from a PDF page, converts coordinates from PyMuPDF space (top-left origin) to Zotero PDF user space (bottom-left origin), and returns structured JSON with page dimensions and text blocks.
    def extract_page_text(
        pdf_path: str | Path,
        page_number: int,
    ) -> dict:
        """提取指定 PDF 页面的文本块及其 Zotero 空间坐标。
    
        Returns:
            {
                "page_number": int,
                "page_width": float,
                "page_height": float,
                "blocks": [
                    {"text": str, "rect": [x0, y0, x1, y1]},
                    ...
                ]
            }
        """
        doc = fitz.open(str(pdf_path))
        try:
            if page_number < 0 or page_number >= len(doc):
                raise ValueError(
                    f"page_number {page_number} 超出范围,"
                    f"该 PDF 共 {len(doc)} 页 (0-indexed)"
                )
    
            page = doc[page_number]
            page_height = page.rect.height
            page_width = page.rect.width
    
            # 使用 dict 模式获取结构化文本(blocks → lines → spans)
            text_dict = page.get_text("dict", flags=fitz.TEXT_PRESERVE_WHITESPACE)
    
            blocks = []
            for block in text_dict["blocks"]:
                if block["type"] != 0:  # 只处理文本块,跳过图片块
                    continue
    
                for line in block["lines"]:
                    # 合并同一行所有 span 的文本
                    line_text = ""
                    for span in line["spans"]:
                        line_text += span["text"]
    
                    line_text = line_text.strip()
                    if not line_text:
                        continue
    
                    # 用行的整体 bbox 作为坐标
                    line_rect = line["bbox"]  # (x0, y0, x1, y1) PyMuPDF 坐标
                    zotero_rect = pymupdf_rect_to_zotero(line_rect, page_height)
    
                    blocks.append({
                        "text": line_text,
                        "rect": zotero_rect,
                    })
    
            logger.info(
                "提取页面 %d: %d 个文本行, 页面尺寸 %.1f x %.1f",
                page_number, len(blocks), page_width, page_height,
            )
    
            return {
                "page_number": page_number,
                "page_width": round(page_width, 3),
                "page_height": round(page_height, 3),
                "blocks": blocks,
            }
        finally:
            doc.close()
  • Coordinate conversion helper: transforms PyMuPDF rects (top-left origin, y-down) to Zotero rects (bottom-left origin, y-up) by flipping the y-axis using page_height.
    def pymupdf_rect_to_zotero(rect: tuple[float, ...], page_height: float) -> list[float]:
        """PyMuPDF rect (左上角原点, y↓) → Zotero rect (左下角原点, y↑)。
    
        PyMuPDF:  (x0, y0_top, x1, y1_top)  y0 < y1, y 从上往下
        Zotero:   [x0, y0_bot, x1, y1_bot]  y0 < y1, y 从下往上
        转换: zotero_y0 = H - pymupdf_y1,  zotero_y1 = H - pymupdf_y0
        """
        x0, y0, x1, y1 = rect[:4]
        return [
            round(x0, 3),
            round(page_height - y1, 3),
            round(x1, 3),
            round(page_height - y0, 3),
        ]
  • MCP tool handler: resolves the item_id to a PDF path, delegates to pdf_tools.extract_page_text(), and returns the result as a JSON string.
    @mcp.tool()
    def get_pdf_layout_text(item_id: str, page_number: int) -> str:
        """提取 PDF 指定页面的文本及物理坐标。
    
        返回 JSON,每个文本行包含 text 和 rect [x0, y0, x1, y1](Zotero PDF 坐标系)。
        可以直接将 rect 传给 create_pdf_annotation 使用。
    
        Args:
            item_id: Zotero PDF 附件的 itemID(数字),或 PDF 文件的绝对路径
            page_number: 页码(从 0 开始)
        """
        pdf_path = _resolve_pdf_path(item_id)
    
        result = pdf_tools.extract_page_text(pdf_path, page_number)
        return json.dumps(result, ensure_ascii=False, indent=2)
  • annota/server.py:63-64 (registration)
    Registration via FastMCP's @mcp.tool() decorator, which registers get_pdf_layout_text as an MCP tool in the 'annota' server.
    @mcp.tool()
    def get_pdf_layout_text(item_id: str, page_number: int) -> str:
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description bears full responsibility. It discloses the JSON output format with text and rect in Zotero PDF coordinates, and explains parameter types (item_id can be number or path, page_number 0-indexed). It does not mention that it is read-only, but the description is sufficiently transparent about its behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with no wasted words. It front-loads the purpose, then explains output format and arguments in a structured manner. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the moderate complexity (2 params, coordinate system), the description is fairly complete. It explains output format and how to use rect with a sibling tool. An output schema exists to cover return value details, so the description is adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It explains both parameters well: item_id as Zotero itemID number or absolute path, page_number as 0-indexed page. This adds significant meaning beyond the schema types of string and integer.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool extracts text and physical coordinates from a specified PDF page, using specific verbs like 'extract' and specifying the resource. It distinguishes itself from sibling tools like get_pdf_text_bulk (likely bulk without coordinates) and create_pdf_annotation (which uses the coordinates).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions that the rect can be directly passed to create_pdf_annotation, providing a clear use case. However, it does not explicitly state when not to use this tool or contrast it with alternatives like get_pdf_text_bulk, though the context from sibling names implies differentiation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/dengls24/annota'

If you have feedback or need assistance with the MCP directory API, please join our Discord server