get_document_metadata

Extract metadata from Office documents without full conversion. Retrieve properties like title, author, and creation date from Word, Excel, and PowerPoint files.

Instructions

Get metadata from an Office document without full conversion.

Extracts document properties like title, author, creation date, etc. Faster than full conversion when you only need metadata.

Supported formats: .docx, .doc, .xlsx, .xls, .pptx, .ppt

Input Schema

TableJSON Schema

Name	Required	Description	Default
`file_path`	Yes	Absolute path to the Office document. Supported: .docx, .doc, .xlsx, .xls, .pptx, .ppt

Implementation Reference

src/officereader_mcp/server.py:305-377 (handler)

Handler implementation for get_document_metadata tool that parses the file_path argument, determines the Office document type, loads the document using appropriate library (docx, openpyxl, or pptx), extracts core properties/metadata, and returns it as JSON.

elif name == "get_document_metadata":
    file_path = arguments.get("file_path")
    if not file_path:
        return [TextContent(
            type="text",
            text=f"{cache_notice}\n\n" + json.dumps({"error": "file_path is required"}, ensure_ascii=False)
        )]

    from .converter import get_file_type

    file_path_obj = Path(file_path)
    file_type = get_file_type(file_path_obj)

    metadata = {
        "file": file_path,
        "file_type": file_type,
        "cache_location": str(converter.cache_dir),
    }

    if file_type == "word":
        from docx import Document
        doc = Document(file_path)
        core_props = doc.core_properties
        metadata.update({
            "title": core_props.title or "",
            "author": core_props.author or "",
            "created": str(core_props.created) if core_props.created else "",
            "modified": str(core_props.modified) if core_props.modified else "",
            "last_modified_by": core_props.last_modified_by or "",
            "subject": core_props.subject or "",
            "keywords": core_props.keywords or "",
            "category": core_props.category or "",
            "comments": core_props.comments or "",
            "revision": core_props.revision,
        })
    elif file_type == "excel":
        from openpyxl import load_workbook
        wb = load_workbook(file_path, data_only=True)
        props = wb.properties
        metadata.update({
            "title": props.title or "",
            "creator": props.creator or "",
            "created": str(props.created) if props.created else "",
            "modified": str(props.modified) if props.modified else "",
            "sheet_count": len(wb.sheetnames),
            "sheet_names": wb.sheetnames,
        })
    elif file_type == "powerpoint":
        from pptx import Presentation
        prs = Presentation(file_path)
        core_props = prs.core_properties
        metadata.update({
            "title": core_props.title or "",
            "author": core_props.author or "",
            "created": str(core_props.created) if core_props.created else "",
            "modified": str(core_props.modified) if core_props.modified else "",
            "subject": core_props.subject or "",
            "slide_count": len(prs.slides),
        })
    else:
        return [TextContent(
            type="text",
            text=f"{cache_notice}\n\n" + json.dumps({
                "error": f"Unsupported file format: {file_path_obj.suffix}",
                "supported": get_supported_extensions()
            }, ensure_ascii=False)
        )]

    return [TextContent(
        type="text",
        text=f"{cache_notice}\n\n" + json.dumps(metadata, ensure_ascii=False, indent=2)
    )]

src/officereader_mcp/server.py:139-157 (registration)

Registration of the get_document_metadata tool via server.list_tools() decorator, defining the tool name, description, and input schema requiring 'file_path'.

        Tool(
            name="get_document_metadata",
            description=f"""Get metadata from an Office document without full conversion.

Extracts document properties like title, author, creation date, etc.
Faster than full conversion when you only need metadata.

Supported formats: {supported_exts}""",
            inputSchema={
                "type": "object",
                "properties": {
                    "file_path": {
                        "type": "string",
                        "description": f"Absolute path to the Office document. Supported: {supported_exts}",
                    },
                },
                "required": ["file_path"],
            },
        ),

src/officereader_mcp/server.py:147-156 (schema)

Input schema for get_document_metadata tool, defining the required 'file_path' parameter.

inputSchema={
    "type": "object",
    "properties": {
        "file_path": {
            "type": "string",
            "description": f"Absolute path to the Office document. Supported: {supported_exts}",
        },
    },
    "required": ["file_path"],
},

OfficeReader-MCP