get_document_outline
Extract the structure of a Word document, including headings and sections, to analyze or navigate its content efficiently.
Instructions
Get the structure of a Word document.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| filename | Yes |
Input Schema (JSON Schema)
{
"properties": {
"filename": {
"title": "Filename",
"type": "string"
}
},
"required": [
"filename"
],
"type": "object"
}
Implementation Reference
- Core handler function for the get_document_outline MCP tool. Ensures the filename has .docx extension, calls the get_document_structure helper to extract document outline data, and serializes it to JSON.async def get_document_outline(filename: str) -> str: """Get the structure of a Word document. Args: filename: Path to the Word document """ filename = ensure_docx_extension(filename) structure = get_document_structure(filename) return json.dumps(structure, indent=2)
- word_document_server/main.py:114-117 (registration)FastMCP registration of the get_document_outline tool using @mcp.tool() decorator. This sync wrapper delegates execution to the async handler in document_tools.py.@mcp.tool() def get_document_outline(filename: str): """Get the structure of a Word document.""" return document_tools.get_document_outline(filename)
- Key helper utility that analyzes the Word document structure, collecting paragraph info (index, text preview, style) and table summaries (dimensions, content preview), returning a dictionary used by the handler to generate the JSON outline.def get_document_structure(doc_path: str) -> Dict[str, Any]: """Get the structure of a Word document.""" import os if not os.path.exists(doc_path): return {"error": f"Document {doc_path} does not exist"} try: doc = Document(doc_path) structure = { "paragraphs": [], "tables": [] } # Get paragraphs for i, para in enumerate(doc.paragraphs): structure["paragraphs"].append({ "index": i, "text": para.text[:100] + ("..." if len(para.text) > 100 else ""), "style": para.style.name if para.style else "Normal" }) # Get tables for i, table in enumerate(doc.tables): table_data = { "index": i, "rows": len(table.rows), "columns": len(table.columns), "preview": [] } # Get sample of table data max_rows = min(3, len(table.rows)) for row_idx in range(max_rows): row_data = [] max_cols = min(3, len(table.columns)) for col_idx in range(max_cols): try: cell_text = table.cell(row_idx, col_idx).text row_data.append(cell_text[:20] + ("..." if len(cell_text) > 20 else "")) except IndexError: row_data.append("N/A") table_data["preview"].append(row_data) structure["tables"].append(table_data) return structure except Exception as e: return {"error": f"Failed to get document structure: {str(e)}"}