parse_file
Extract text and data from PDF, Word, Excel, CSV, and Markdown files to process document content within the MCP Development Framework.
Instructions
解析文件内容,支持PDF、Word、Excel、CSV和Markdown格式
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| file_path | Yes | 文件的本地路径,例如'/path/to/document.pdf' |
Implementation Reference
- mcp_tool/tools/file_tool.py:24-36 (registration)Registration of the 'parse_file' tool via @ToolRegistry.register decorator on FileTool class, including name assignment.@ToolRegistry.register class FileTool(BaseTool): """ 综合文件处理工具,根据文件扩展名自动选择合适的处理方式 支持的文件类型: - PDF文件 (.pdf) - Word文档 (.doc, .docx) - Excel文件 (.xls, .xlsx, .xlsm) - CSV文件 (.csv) - Markdown文件 (.md) """ name = "parse_file"
- mcp_tool/tools/file_tool.py:38-47 (schema)Input schema defining the required 'file_path' parameter for the tool.input_schema = { "type": "object", "required": ["file_path"], "properties": { "file_path": { "type": "string", "description": "文件的本地路径,例如'/path/to/document.pdf'", } }, }
- mcp_tool/tools/file_tool.py:58-109 (handler)The handler function that executes the tool: validates input, processes file path, determines file type by extension, delegates to specialized sub-tools (PdfTool, WordTool, etc.), and handles errors.async def execute(self, arguments: Dict[str, Any]) -> List[types.TextContent | types.ImageContent | types.EmbeddedResource]: """ 解析文件内容 Args: arguments: 参数字典,必须包含'file_path'键 Returns: 解析结果列表 """ if "file_path" not in arguments: return [types.TextContent( type="text", text="错误: 缺少必要参数 'file_path'" )] file_path = arguments["file_path"] # 处理文件路径,支持挂载目录的转换 file_path = self.process_file_path(file_path) if not os.path.exists(file_path): return [types.TextContent( type="text", text=f"错误: 文件不存在: {file_path}" )] # 获取文件扩展名(转换为小写) file_ext = os.path.splitext(file_path)[1].lower() try: # 根据文件扩展名选择处理工具 if file_ext == '.pdf': return await self.pdf_tool.execute(arguments) elif file_ext in ['.doc', '.docx']: return await self.word_tool.execute(arguments) elif file_ext in ['.xls', '.xlsx', '.xlsm']: return await self.excel_tool.execute(arguments) elif file_ext == '.csv': return await self.csv_tool.execute(arguments) elif file_ext == '.md': return await self.markdown_tool.execute(arguments) else: return [types.TextContent( type="text", text=f"错误: 不支持的文件类型: {file_ext}" )] except Exception as e: error_details = traceback.format_exc() return [types.TextContent( type="text", text=f"错误: 处理文件时发生错误: {str(e)}\n{error_details}" )]