Skip to main content
Glama
aigo666

MCP Development Framework

parse_pdf

Parses PDF files to extract text and images, offering quick preview or full extraction modes.

Instructions

解析PDF文件内容,支持快速预览和完整解析两种模式

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
file_pathYesPDF文件的本地路径,例如'/path/to/document.pdf'
modeNo解析模式:'quick'(仅文本)或'full'(文本和图片),默认为'full'full

Implementation Reference

  • The main handler (execute method) for the parse_pdf tool. It validates inputs (file_path, mode), processes the file path, and dispatches to either _quick_preview_pdf or _full_parse_pdf.
    async def execute(self, arguments: Dict[str, Any]) -> List[types.TextContent | types.ImageContent | types.EmbeddedResource]:
        """
        解析PDF文件
        
        Args:
            arguments: 参数字典,必须包含'file_path'键,可选'mode'键
        
        Returns:
            解析结果列表
        """
        if "file_path" not in arguments:
            return [types.TextContent(
                type="text",
                text="错误: 缺少必要参数 'file_path'"
            )]
        
        file_path = arguments["file_path"]
        # 处理文件路径,支持挂载目录的转换
        file_path = self.process_file_path(file_path)
        
        if not os.path.exists(file_path):
            return [types.TextContent(
                type="text",
                text=f"错误: 文件不存在: {file_path}"
            )]
        
        if not file_path.lower().endswith('.pdf'):
            return [types.TextContent(
                type="text",
                text=f"错误: 文件不是PDF格式: {file_path}"
            )]
        
        mode = arguments.get("mode", "full")
        
        if mode == "quick":
            return await self._quick_preview_pdf(file_path)
        else:
            return await self._full_parse_pdf(file_path)
  • Quick preview mode handler: extracts only text content from the PDF using PyMuPDF (fitz).
    async def _quick_preview_pdf(self, file_path: str) -> List[types.TextContent | types.ImageContent | types.EmbeddedResource]:
        """
        快速预览PDF文件,仅提取文本内容
        """
        try:
            # 使用PyMuPDF提取文本
            doc = fitz.open(file_path)
            text_content = []
            
            # 添加文件信息
            text_content.append(f"文件名: {os.path.basename(file_path)}")
            text_content.append(f"页数: {doc.page_count}")
            text_content.append("---")
            
            # 提取每页文本
            for page_num in range(doc.page_count):
                page = doc[page_num]
                text = page.get_text()
                if text.strip():
                    text_content.append(f"第{page_num + 1}页:")
                    text_content.append(text)
                    text_content.append("---")
            
            doc.close()
            
            return [types.TextContent(
                type="text",
                text="\n".join(text_content)
            )]
            
        except Exception as e:
            error_details = traceback.format_exc()
            return [types.TextContent(
                type="text",
                text=f"错误: 快速预览PDF时发生错误: {str(e)}\n{error_details}"
            )]
  • Full parse mode handler: extracts both text and images from the PDF using PyMuPDF, runs OCR on images via pytesseract, and returns the results including base64-encoded images.
    async def _full_parse_pdf(self, file_path: str) -> List[types.TextContent | types.ImageContent | types.EmbeddedResource]:
        """
        完整解析PDF文件,提取文本和图片内容
        """
        results = []
        
        try:
            # 使用PyMuPDF提取文本和图片
            doc = fitz.open(file_path)
            
            # 添加文件信息
            results.append(types.TextContent(
                type="text",
                text=f"文件名: {os.path.basename(file_path)}\n页数: {doc.page_count}\n---"
            ))
            
            # 处理每一页
            for page_num in range(doc.page_count):
                page = doc[page_num]
                
                # 提取文本
                text = page.get_text()
                if text.strip():
                    results.append(types.TextContent(
                        type="text",
                        text=f"第{page_num + 1}页:\n{text}\n---"
                    ))
                
                # 提取图片
                image_list = page.get_images()
                if image_list:
                    results.append(types.TextContent(
                        type="text",
                        text=f"第{page_num + 1}页包含{len(image_list)}张图片"
                    ))
                    
                    # 处理各页的图片
                    skipped_images = 0
                    successful_images = 0
                    
                    for img_idx, img_info in enumerate(image_list):
                        try:
                            xref = img_info[0]
                            base_image = doc.extract_image(xref)
                            image_bytes = base_image["image"]
                            
                            # 获取图片MIME类型并检查是否支持
                            mime_type = self._get_image_mime_type(image_bytes)
                            supported_mime_types = ["image/jpeg", "image/png", "image/gif", "image/webp"]
                            
                            # 如果格式不受支持,则跳过该图片
                            if mime_type not in supported_mime_types:
                                skipped_images += 1
                                continue
                            
                            # 添加图片OCR识别结果
                            image_analysis = await self._analyze_image(image_bytes)
                            results.append(types.TextContent(
                                type="text",
                                text=f"第{page_num + 1}页 图片{successful_images + 1}分析结果:\n{image_analysis}\n---"
                            ))
                            
                            # 添加图片内容,直接返回图片而非只返回OCR文本
                            image_base64 = self._encode_image_base64(image_bytes)
                            results.append(types.ImageContent(
                                type="image",
                                data=image_base64,
                                mimeType=mime_type
                            ))
                            
                            successful_images += 1
                        except Exception:
                            # 捕获所有异常,但不中断处理流程
                            skipped_images += 1
                    
                    # 如果有跳过的图片,添加简单提示
                    if skipped_images > 0:
                        results.append(types.TextContent(
                            type="text",
                            text=f"注意: 第{page_num + 1}页有 {skipped_images} 张图片因格式问题已跳过处理。"
                        ))
            
            doc.close()
            return results
            
        except Exception as e:
            error_details = traceback.format_exc()
            return [types.TextContent(
                type="text",
                text=f"错误: 完整解析PDF时发生错误: {str(e)}\n{error_details}"
            )] 
  • Tool name, description, and input schema definition for parse_pdf. Defines required 'file_path' parameter and optional 'mode' parameter (quick/full).
    name = "parse_pdf"
    description = "解析PDF文件内容,支持快速预览和完整解析两种模式"
    input_schema = {
        "type": "object",
        "required": ["file_path"],
        "properties": {
            "file_path": {
                "type": "string",
                "description": "PDF文件的本地路径,例如'/path/to/document.pdf'",
            },
            "mode": {
                "type": "string",
                "description": "解析模式:'quick'(仅文本)或'full'(文本和图片),默认为'full'",
                "enum": ["quick", "full"],
                "default": "full"
            }
        },
    }
  • Registration decorator that registers the PdfTool class with the ToolRegistry so it's discovered and available as the 'parse_pdf' tool.
    @ToolRegistry.register
    class PdfTool(BaseTool):
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must cover behavioral traits. It only mentions two parsing modes but omits details like read-only nature, file access permissions, error handling, or side effects. The agent receives insufficient behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single concise sentence that conveys the core purpose and key feature (two modes). No extraneous information, every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite the tool having only two simple parameters and no output schema, the description fails to mention what the output looks like or how results are returned. For a parsing tool, this is a significant gap that limits an agent's ability to process results.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%: both 'file_path' and 'mode' are described in the input schema. The description adds minimal value by paraphrasing the mode parameter's purpose, but it does not clarify file_path beyond 'local path'. With high coverage, a score of 3 is baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: parsing PDF files, with two modes (quick preview and full parsing). It unambiguously identifies the target resource (PDF) and differentiates from sibling tools that parse other formats like CSV, Excel, etc.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for PDF parsing but does not explicitly specify when to use or not use this tool versus alternatives. No exclusions or context are provided, leaving the agent to infer from the tool name and siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/aigo666/mcp-framework'

If you have feedback or need assistance with the MCP directory API, please join our Discord server