parse_file

Parse file contents from PDF, Word, Excel, CSV, and Markdown formats by providing a local file path. Supports multiple document types for text extraction.

Instructions

解析文件内容，支持PDF、Word、Excel、CSV和Markdown格式

Input Schema

TableJSON Schema

Name	Required	Description	Default
`file_path`	Yes	文件的本地路径，例如'/path/to/document.pdf'

Implementation Reference

mcp_tool/tools/file_tool.py:58-109 (handler)

The execute method of FileTool class — the main handler for the 'parse_file' tool. It receives arguments, validates file_path, determines file extension, and delegates to the appropriate specialized tool (pdf, word, excel, csv, markdown).

async def execute(self, arguments: Dict[str, Any]) -> List[types.TextContent | types.ImageContent | types.EmbeddedResource]:
    """
    解析文件内容
    
    Args:
        arguments: 参数字典，必须包含'file_path'键
    
    Returns:
        解析结果列表
    """
    if "file_path" not in arguments:
        return [types.TextContent(
            type="text",
            text="错误: 缺少必要参数 'file_path'"
        )]
    
    file_path = arguments["file_path"]
    # 处理文件路径，支持挂载目录的转换
    file_path = self.process_file_path(file_path)
    
    if not os.path.exists(file_path):
        return [types.TextContent(
            type="text",
            text=f"错误: 文件不存在: {file_path}"
        )]
    
    # 获取文件扩展名（转换为小写）
    file_ext = os.path.splitext(file_path)[1].lower()
    
    try:
        # 根据文件扩展名选择处理工具
        if file_ext == '.pdf':
            return await self.pdf_tool.execute(arguments)
        elif file_ext in ['.doc', '.docx']:
            return await self.word_tool.execute(arguments)
        elif file_ext in ['.xls', '.xlsx', '.xlsm']:
            return await self.excel_tool.execute(arguments)
        elif file_ext == '.csv':
            return await self.csv_tool.execute(arguments)
        elif file_ext == '.md':
            return await self.markdown_tool.execute(arguments)
        else:
            return [types.TextContent(
                type="text",
                text=f"错误: 不支持的文件类型: {file_ext}"
            )]
    except Exception as e:
        error_details = traceback.format_exc()
        return [types.TextContent(
            type="text",
            text=f"错误: 处理文件时发生错误: {str(e)}\n{error_details}"
        )]

mcp_tool/tools/file_tool.py:38-47 (schema)

The input_schema defines the required 'file_path' parameter (a string describing the local file path) for the parse_file tool.

input_schema = {
    "type": "object",
    "required": ["file_path"],
    "properties": {
        "file_path": {
            "type": "string",
            "description": "文件的本地路径，例如'/path/to/document.pdf'",
        }
    },
}

mcp_tool/tools/file_tool.py:24-50 (registration)

FileTool class is decorated with @ToolRegistry.register, which registers it under the name 'parse_file' (line 36). This makes the tool available to the MCP server.

@ToolRegistry.register
class FileTool(BaseTool):
    """
    综合文件处理工具，根据文件扩展名自动选择合适的处理方式
    支持的文件类型：
    - PDF文件 (.pdf)
    - Word文档 (.doc, .docx)
    - Excel文件 (.xls, .xlsx, .xlsm)
    - CSV文件 (.csv)
    - Markdown文件 (.md)
    """
    
    name = "parse_file"
    description = "解析文件内容，支持PDF、Word、Excel、CSV和Markdown格式"
    input_schema = {
        "type": "object",
        "required": ["file_path"],
        "properties": {
            "file_path": {
                "type": "string",
                "description": "文件的本地路径，例如'/path/to/document.pdf'",
            }
        },
    }
    
    def __init__(self):
        """初始化各种文件处理工具"""

mcp_tool/tools/__init__.py:38-42 (registration)
ToolRegistry class that provides the @register decorator. When @ToolRegistry.register is applied to a class, it stores the class in _tools dict keyed by its name (e.g., 'parse_file').
```
        return file_path


# 工具注册器
class ToolRegistry:
```

mcp_tool/tools/loader.py:12-23 (helper)

The load_tools() function auto-discovers and imports all tool modules (including file_tool.py), which triggers the @ToolRegistry.register decorators, registering 'parse_file' among other tools.

def load_tools() -> List[Type[BaseTool]]:
    """
    自动加载tools目录下的所有工具模块
    
    Returns:
        List[Type[BaseTool]]: 已加载的工具类列表
    """
    # 获取当前模块的路径
    package_path = os.path.dirname(__file__)
    
    # 获取所有子模块
    for _, name, is_pkg in pkgutil.iter_modules([package_path]):

MCP Development Framework

parse_file

Instructions

Input Schema

Implementation Reference

Tool Definition Quality

Other Tools

Latest Blog Posts

MCP directory API