Skip to main content
Glama
aigo666

MCP Development Framework

parse_pdf

Extract text and images from PDF files using quick text-only or full content parsing modes to access document information.

Instructions

解析PDF文件内容,支持快速预览和完整解析两种模式

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
file_pathYesPDF文件的本地路径,例如'/path/to/document.pdf'
modeNo解析模式:'quick'(仅文本)或'full'(文本和图片),默认为'full'full

Implementation Reference

  • Registers the parse_pdf tool via @ToolRegistry.register decorator on PdfTool class, setting name and description.
    @ToolRegistry.register
    class PdfTool(BaseTool):
        """
        PDF解析工具,支持两种模式:
        1. 快速预览模式:仅提取文本内容,适用于大型PDF文件
        2. 完整解析模式:提取文本和图片内容,提供更详细的文档分析
        """
        
        name = "parse_pdf"
        description = "解析PDF文件内容,支持快速预览和完整解析两种模式"
  • Input schema defining parameters for file_path (required) and mode (optional, quick/full).
    input_schema = {
        "type": "object",
        "required": ["file_path"],
        "properties": {
            "file_path": {
                "type": "string",
                "description": "PDF文件的本地路径,例如'/path/to/document.pdf'",
            },
            "mode": {
                "type": "string",
                "description": "解析模式:'quick'(仅文本)或'full'(文本和图片),默认为'full'",
                "enum": ["quick", "full"],
                "default": "full"
            }
        },
    }
  • Main execute handler: validates input, processes file path, checks existence and PDF format, dispatches to quick or full parse mode.
    async def execute(self, arguments: Dict[str, Any]) -> List[types.TextContent | types.ImageContent | types.EmbeddedResource]:
        """
        解析PDF文件
        
        Args:
            arguments: 参数字典,必须包含'file_path'键,可选'mode'键
        
        Returns:
            解析结果列表
        """
        if "file_path" not in arguments:
            return [types.TextContent(
                type="text",
                text="错误: 缺少必要参数 'file_path'"
            )]
        
        file_path = arguments["file_path"]
        # 处理文件路径,支持挂载目录的转换
        file_path = self.process_file_path(file_path)
        
        if not os.path.exists(file_path):
            return [types.TextContent(
                type="text",
                text=f"错误: 文件不存在: {file_path}"
            )]
        
        if not file_path.lower().endswith('.pdf'):
            return [types.TextContent(
                type="text",
                text=f"错误: 文件不是PDF格式: {file_path}"
            )]
        
        mode = arguments.get("mode", "full")
        
        if mode == "quick":
            return await self._quick_preview_pdf(file_path)
        else:
            return await self._full_parse_pdf(file_path)
  • Helper function implementing full PDF parsing: extracts text and images per page using PyMuPDF, performs OCR on images, encodes images as base64.
    async def _full_parse_pdf(self, file_path: str) -> List[types.TextContent | types.ImageContent | types.EmbeddedResource]:
        """
        完整解析PDF文件,提取文本和图片内容
        """
        results = []
        
        try:
            # 使用PyMuPDF提取文本和图片
            doc = fitz.open(file_path)
            
            # 添加文件信息
            results.append(types.TextContent(
                type="text",
                text=f"文件名: {os.path.basename(file_path)}\n页数: {doc.page_count}\n---"
            ))
            
            # 处理每一页
            for page_num in range(doc.page_count):
                page = doc[page_num]
                
                # 提取文本
                text = page.get_text()
                if text.strip():
                    results.append(types.TextContent(
                        type="text",
                        text=f"第{page_num + 1}页:\n{text}\n---"
                    ))
                
                # 提取图片
                image_list = page.get_images()
                if image_list:
                    results.append(types.TextContent(
                        type="text",
                        text=f"第{page_num + 1}页包含{len(image_list)}张图片"
                    ))
                    
                    # 处理各页的图片
                    skipped_images = 0
                    successful_images = 0
                    
                    for img_idx, img_info in enumerate(image_list):
                        try:
                            xref = img_info[0]
                            base_image = doc.extract_image(xref)
                            image_bytes = base_image["image"]
                            
                            # 获取图片MIME类型并检查是否支持
                            mime_type = self._get_image_mime_type(image_bytes)
                            supported_mime_types = ["image/jpeg", "image/png", "image/gif", "image/webp"]
                            
                            # 如果格式不受支持,则跳过该图片
                            if mime_type not in supported_mime_types:
                                skipped_images += 1
                                continue
                            
                            # 添加图片OCR识别结果
                            image_analysis = await self._analyze_image(image_bytes)
                            results.append(types.TextContent(
                                type="text",
                                text=f"第{page_num + 1}页 图片{successful_images + 1}分析结果:\n{image_analysis}\n---"
                            ))
                            
                            # 添加图片内容,直接返回图片而非只返回OCR文本
                            image_base64 = self._encode_image_base64(image_bytes)
                            results.append(types.ImageContent(
                                type="image",
                                data=image_base64,
                                mimeType=mime_type
                            ))
                            
                            successful_images += 1
                        except Exception:
                            # 捕获所有异常,但不中断处理流程
                            skipped_images += 1
                    
                    # 如果有跳过的图片,添加简单提示
                    if skipped_images > 0:
                        results.append(types.TextContent(
                            type="text",
                            text=f"注意: 第{page_num + 1}页有 {skipped_images} 张图片因格式问题已跳过处理。"
                        ))
            
            doc.close()
            return results
            
        except Exception as e:
            error_details = traceback.format_exc()
            return [types.TextContent(
                type="text",
                text=f"错误: 完整解析PDF时发生错误: {str(e)}\n{error_details}"
            )] 
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions two parsing modes but doesn't describe what '快速预览' or '完整解析' entail in practice (e.g., speed differences, output format, error handling, or resource usage). For a tool that processes files, this lack of detail on behavior, permissions, or potential side effects is a significant gap.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence in Chinese that directly states the tool's function and key feature (two parsing modes). It's front-loaded with the core purpose and avoids any redundant or unnecessary information, making it highly concise and well-structured for quick understanding.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the lack of annotations and output schema, the description is incomplete for a file-processing tool. It doesn't explain what the tool returns (e.g., text, structured data, images), error conditions, or operational constraints. While the schema covers inputs well, the overall context for an agent to use this tool effectively is insufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with both parameters well-documented in the schema (file_path and mode with enum values). The description adds minimal value beyond the schema by mentioning the two modes but doesn't provide additional context like performance implications or use cases for each mode. This meets the baseline for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose as '解析PDF文件内容' (parse PDF file content), which is a specific verb+resource combination. It distinguishes itself from siblings like parse_csv or parse_excel by specifying PDF format, though it doesn't explicitly differentiate from parse_file which might handle multiple formats. The description is not tautological and accurately reflects the tool's function.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions two modes ('快速预览' - quick preview and '完整解析' - complete parsing), which implies usage contexts, but doesn't provide explicit guidance on when to use this tool versus alternatives like parse_file or other parsing tools. There's no mention of prerequisites, limitations, or comparative scenarios with sibling tools, leaving usage decisions largely to inference.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/aigo666/mcp-framework'

If you have feedback or need assistance with the MCP directory API, please join our Discord server