annota

Overview Schema Related Servers Score Discussions

get_pdf_layout_text

Extract text and bounding rectangles from a specified PDF page, returning each line with coordinates for direct use in annotation tools.

Instructions

提取 PDF 指定页面的文本及物理坐标。

返回 JSON，每个文本行包含 text 和 rect [x0, y0, x1, y1]（Zotero PDF 坐标系）。可以直接将 rect 传给 create_pdf_annotation 使用。

Args: item_id: Zotero PDF 附件的 itemID（数字），或 PDF 文件的绝对路径 page_number: 页码（从 0 开始）

Input Schema

TableJSON Schema

Name	Required	Description	Default
`item_id`	Yes
`page_number`	Yes

Output Schema

TableJSON Schema

Name	Required	Description	Default
`result`	Yes

Implementation Reference

annota/pdf_tools.py:44-112 (handler)

Core implementation: uses PyMuPDF (fitz) to extract text lines and their bounding boxes from a PDF page, converts coordinates from PyMuPDF space (top-left origin) to Zotero PDF user space (bottom-left origin), and returns structured JSON with page dimensions and text blocks.

def extract_page_text(
    pdf_path: str | Path,
    page_number: int,
) -> dict:
    """提取指定 PDF 页面的文本块及其 Zotero 空间坐标。

    Returns:
        {
            "page_number": int,
            "page_width": float,
            "page_height": float,
            "blocks": [
                {"text": str, "rect": [x0, y0, x1, y1]},
                ...
            ]
        }
    """
    doc = fitz.open(str(pdf_path))
    try:
        if page_number < 0 or page_number >= len(doc):
            raise ValueError(
                f"page_number {page_number} 超出范围，"
                f"该 PDF 共 {len(doc)} 页 (0-indexed)"
            )

        page = doc[page_number]
        page_height = page.rect.height
        page_width = page.rect.width

        # 使用 dict 模式获取结构化文本（blocks → lines → spans）
        text_dict = page.get_text("dict", flags=fitz.TEXT_PRESERVE_WHITESPACE)

        blocks = []
        for block in text_dict["blocks"]:
            if block["type"] != 0:  # 只处理文本块，跳过图片块
                continue

            for line in block["lines"]:
                # 合并同一行所有 span 的文本
                line_text = ""
                for span in line["spans"]:
                    line_text += span["text"]

                line_text = line_text.strip()
                if not line_text:
                    continue

                # 用行的整体 bbox 作为坐标
                line_rect = line["bbox"]  # (x0, y0, x1, y1) PyMuPDF 坐标
                zotero_rect = pymupdf_rect_to_zotero(line_rect, page_height)

                blocks.append({
                    "text": line_text,
                    "rect": zotero_rect,
                })

        logger.info(
            "提取页面 %d: %d 个文本行, 页面尺寸 %.1f x %.1f",
            page_number, len(blocks), page_width, page_height,
        )

        return {
            "page_number": page_number,
            "page_width": round(page_width, 3),
            "page_height": round(page_height, 3),
            "blocks": blocks,
        }
    finally:
        doc.close()

annota/pdf_tools.py:20-33 (helper)

Coordinate conversion helper: transforms PyMuPDF rects (top-left origin, y-down) to Zotero rects (bottom-left origin, y-up) by flipping the y-axis using page_height.

def pymupdf_rect_to_zotero(rect: tuple[float, ...], page_height: float) -> list[float]:
    """PyMuPDF rect (左上角原点, y↓) → Zotero rect (左下角原点, y↑)。

    PyMuPDF:  (x0, y0_top, x1, y1_top)  y0 < y1, y 从上往下
    Zotero:   [x0, y0_bot, x1, y1_bot]  y0 < y1, y 从下往上
    转换: zotero_y0 = H - pymupdf_y1,  zotero_y1 = H - pymupdf_y0
    """
    x0, y0, x1, y1 = rect[:4]
    return [
        round(x0, 3),
        round(page_height - y1, 3),
        round(x1, 3),
        round(page_height - y0, 3),
    ]

annota/server.py:63-77 (handler)

MCP tool handler: resolves the item_id to a PDF path, delegates to pdf_tools.extract_page_text(), and returns the result as a JSON string.

@mcp.tool()
def get_pdf_layout_text(item_id: str, page_number: int) -> str:
    """提取 PDF 指定页面的文本及物理坐标。

    返回 JSON，每个文本行包含 text 和 rect [x0, y0, x1, y1]（Zotero PDF 坐标系）。
    可以直接将 rect 传给 create_pdf_annotation 使用。

    Args:
        item_id: Zotero PDF 附件的 itemID（数字），或 PDF 文件的绝对路径
        page_number: 页码（从 0 开始）
    """
    pdf_path = _resolve_pdf_path(item_id)

    result = pdf_tools.extract_page_text(pdf_path, page_number)
    return json.dumps(result, ensure_ascii=False, indent=2)

annota/server.py:63-64 (registration)
Registration via FastMCP's @mcp.tool() decorator, which registers get_pdf_layout_text as an MCP tool in the 'annota' server.
```
@mcp.tool()
def get_pdf_layout_text(item_id: str, page_number: int) -> str:
```

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description bears full responsibility. It discloses the JSON output format with text and rect in Zotero PDF coordinates, and explains parameter types (item_id can be number or path, page_number 0-indexed). It does not mention that it is read-only, but the description is sufficiently transparent about its behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with no wasted words. It front-loads the purpose, then explains output format and arguments in a structured manner. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the moderate complexity (2 params, coordinate system), the description is fairly complete. It explains output format and how to use rect with a sibling tool. An output schema exists to cover return value details, so the description is adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It explains both parameters well: item_id as Zotero itemID number or absolute path, page_number as 0-indexed page. This adds significant meaning beyond the schema types of string and integer.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool extracts text and physical coordinates from a specified PDF page, using specific verbs like 'extract' and specifying the resource. It distinguishes itself from sibling tools like get_pdf_text_bulk (likely bulk without coordinates) and create_pdf_annotation (which uses the coordinates).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions that the rect can be directly passed to create_pdf_annotation, providing a clear use case. However, it does not explicitly state when not to use this tool or contrast it with alternatives like get_pdf_text_bulk, though the context from sibling names implies differentiation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

Who's Calling? MCP Hosts Are an Identity Blind Spot (And the Spec Knows It)
By Om-Shree-0709 on July 25, 2026.
mcp
Agent Identity
OAuth 2.1
Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/dengls24/annota'

If you have feedback or need assistance with the MCP directory API, please join our Discord server