get_document_outline
Extract the hierarchical structure of a Word document to analyze headings, sections, and content organization. Input a filename to retrieve the document outline in clear, structured format.
Instructions
Get the structure of a Word document.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| filename | Yes |
Implementation Reference
- The primary handler function for the get_document_outline MCP tool. Ensures .docx extension for local files, calls get_document_structure utility, and returns the structure as JSON.async def get_document_outline(filename: str) -> str: """Get the structure of a Word document from local path or URL. Args: filename: Path or URL to the Word document """ # Only add .docx extension for local paths, not URLs if not is_url(filename): filename = ensure_docx_extension(filename) structure = get_document_structure(filename) return json.dumps(structure, indent=2)
- word_document_server/main.py:542-545 (registration)MCP tool registration in the main server file. Wraps the document_tools.get_document_outline function and registers it with FastMCP using the @mcp.tool() decorator.@mcp.tool() async def get_document_outline(filename: str): """Get the structure of a Word document.""" return await document_tools.get_document_outline(filename)
- Core utility function that loads the document (handling local paths and URLs), parses paragraphs with index, preview text, and style, and tables with dimensions and preview data, returning a structured dictionary used by the tool handler.def get_document_structure(doc_path: str) -> Dict[str, Any]: """Get the structure of a Word document from local path or URL.""" doc, error, is_temp, temp_path = load_document_from_path_or_url(doc_path) if error: return {"error": error} try: structure = { "paragraphs": [], "tables": [] } # Get paragraphs for i, para in enumerate(doc.paragraphs): structure["paragraphs"].append({ "index": i, "text": para.text[:100] + ("..." if len(para.text) > 100 else ""), "style": para.style.name if para.style else "Normal" }) # Get tables for i, table in enumerate(doc.tables): table_data = { "index": i, "rows": len(table.rows), "columns": len(table.columns), "preview": [] } # Get sample of table data max_rows = min(3, len(table.rows)) for row_idx in range(max_rows): row_data = [] max_cols = min(3, len(table.columns)) for col_idx in range(max_cols): try: cell_text = table.cell(row_idx, col_idx).text row_data.append(cell_text[:20] + ("..." if len(cell_text) > 20 else "")) except IndexError: row_data.append("N/A") table_data["preview"].append(row_data) structure["tables"].append(table_data) return structure except Exception as e: return {"error": f"Failed to get document structure: {str(e)}"} finally: # Clean up temp file if needed if is_temp and temp_path: cleanup_temp_file(temp_path)