Skip to main content
Glama

get_document_outline

Extract the hierarchical structure of a Word document to analyze headings, sections, and content organization. Input a filename to retrieve the document outline in clear, structured format.

Instructions

Get the structure of a Word document.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
filenameYes

Implementation Reference

  • The primary handler function for the get_document_outline MCP tool. Ensures .docx extension for local files, calls get_document_structure utility, and returns the structure as JSON.
    async def get_document_outline(filename: str) -> str:
        """Get the structure of a Word document from local path or URL.
    
        Args:
            filename: Path or URL to the Word document
        """
        # Only add .docx extension for local paths, not URLs
        if not is_url(filename):
            filename = ensure_docx_extension(filename)
    
        structure = get_document_structure(filename)
        return json.dumps(structure, indent=2)
  • MCP tool registration in the main server file. Wraps the document_tools.get_document_outline function and registers it with FastMCP using the @mcp.tool() decorator.
    @mcp.tool()
    async def get_document_outline(filename: str):
        """Get the structure of a Word document."""
        return await document_tools.get_document_outline(filename)
  • Core utility function that loads the document (handling local paths and URLs), parses paragraphs with index, preview text, and style, and tables with dimensions and preview data, returning a structured dictionary used by the tool handler.
    def get_document_structure(doc_path: str) -> Dict[str, Any]:
        """Get the structure of a Word document from local path or URL."""
        doc, error, is_temp, temp_path = load_document_from_path_or_url(doc_path)
    
        if error:
            return {"error": error}
    
        try:
            structure = {
                "paragraphs": [],
                "tables": []
            }
    
            # Get paragraphs
            for i, para in enumerate(doc.paragraphs):
                structure["paragraphs"].append({
                    "index": i,
                    "text": para.text[:100] + ("..." if len(para.text) > 100 else ""),
                    "style": para.style.name if para.style else "Normal"
                })
    
            # Get tables
            for i, table in enumerate(doc.tables):
                table_data = {
                    "index": i,
                    "rows": len(table.rows),
                    "columns": len(table.columns),
                    "preview": []
                }
    
                # Get sample of table data
                max_rows = min(3, len(table.rows))
                for row_idx in range(max_rows):
                    row_data = []
                    max_cols = min(3, len(table.columns))
                    for col_idx in range(max_cols):
                        try:
                            cell_text = table.cell(row_idx, col_idx).text
                            row_data.append(cell_text[:20] + ("..." if len(cell_text) > 20 else ""))
                        except IndexError:
                            row_data.append("N/A")
                    table_data["preview"].append(row_data)
    
                structure["tables"].append(table_data)
    
            return structure
        except Exception as e:
            return {"error": f"Failed to get document structure: {str(e)}"}
        finally:
            # Clean up temp file if needed
            if is_temp and temp_path:
                cleanup_temp_file(temp_path)

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/franlealp1/mcp-word'

If you have feedback or need assistance with the MCP directory API, please join our Discord server