get_document_outline
Extract the hierarchical structure of a Microsoft Word document to quickly understand its organization, sections, and headings.
Instructions
Get the structure of a Word document.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| filename | Yes |
Implementation Reference
- Main handler function for the get_document_outline tool. Ensures .docx extension, calls get_document_structure helper, and returns JSON outline.async def get_document_outline(filename: str) -> str: """Get the structure of a Word document. Args: filename: Path to the Word document """ filename = ensure_docx_extension(filename) structure = get_document_structure(filename) return json.dumps(structure, indent=2)
- Core helper function that parses the Word document and extracts structured information about paragraphs (with preview text and styles) and tables (with dimensions and content preview).def get_document_structure(doc_path: str) -> Dict[str, Any]: """Get the structure of a Word document.""" import os if not os.path.exists(doc_path): return {"error": f"Document {doc_path} does not exist"} try: doc = Document(doc_path) structure = { "paragraphs": [], "tables": [] } # Get paragraphs for i, para in enumerate(doc.paragraphs): structure["paragraphs"].append({ "index": i, "text": para.text[:100] + ("..." if len(para.text) > 100 else ""), "style": para.style.name if para.style else "Normal" }) # Get tables for i, table in enumerate(doc.tables): table_data = { "index": i, "rows": len(table.rows), "columns": len(table.columns), "preview": [] } # Get sample of table data max_rows = min(3, len(table.rows)) for row_idx in range(max_rows): row_data = [] max_cols = min(3, len(table.columns)) for col_idx in range(max_cols): try: cell_text = table.cell(row_idx, col_idx).text row_data.append(cell_text[:20] + ("..." if len(cell_text) > 20 else "")) except IndexError: row_data.append("N/A") table_data["preview"].append(row_data) structure["tables"].append(table_data) return structure except Exception as e: return {"error": f"Failed to get document structure: {str(e)}"}
- word_document_server/main.py:114-117 (registration)MCP tool registration using FastMCP @mcp.tool() decorator. Thin synchronous wrapper that delegates to the async implementation in document_tools.@mcp.tool() def get_document_outline(filename: str): """Get the structure of a Word document.""" return document_tools.get_document_outline(filename)