DOCX MCP Server

README.md•5.29 kB

# DOCX MCP Server A comprehensive Model Context Protocol (MCP) server for processing Microsoft Word (.docx) documents with full formatting support. ## Features This MCP server provides advanced DOCX document processing capabilities using the powerful `mammoth` library: - **Text Extraction**: Extract plain text with word count - **HTML Conversion**: Convert to HTML with preserved formatting - **Structure Analysis**: Analyze document structure, headings, and formatting elements - **Image Extraction**: Extract embedded images (as base64 or save to files) - **Markdown Conversion**: Convert to Markdown format - **Rich Formatting Support**: Handles bold, italic, lists, headings, and more ## Available Tools ### 1. `extract_text` Extract plain text content from a DOCX file. **Parameters:** - `file_path` (string): Path to the .docx file **Returns:** - Plain text content - Processing messages - Word count ### 2. `convert_to_html` Convert DOCX file to HTML with formatting preserved. **Parameters:** - `file_path` (string): Path to the .docx file - `include_styles` (boolean, optional): Include inline styles (default: true) **Returns:** - HTML content with formatting - Processing messages - Warnings and errors ### 3. `analyze_structure` Analyze document structure, headings, and formatting elements. **Parameters:** - `file_path` (string): Path to the .docx file **Returns:** - Document statistics (characters, words, paragraphs, headings) - Structure analysis (headings with levels) - Formatting analysis (bold, italic, lists count) - Processing messages ### 4. `extract_images` Extract and list images from a DOCX file. **Parameters:** - `file_path` (string): Path to the .docx file - `output_dir` (string, optional): Directory to save extracted images **Returns:** - Total image count - Image details (src, alt text, base64 status) - Output directory information - Processing messages ### 5. `convert_to_markdown` Convert DOCX file to Markdown format. **Parameters:** - `file_path` (string): Path to the .docx file **Returns:** - Markdown content - Word count - Processing messages ## Installation ```bash npm install npm run build ``` ## Usage The server runs on stdio and communicates via JSON-RPC 2.0 protocol. ### Example Usage with MCP Client ```json { "jsonrpc": "2.0", "id": 1, "method": "tools/call", "params": { "name": "analyze_structure", "arguments": { "file_path": "/path/to/document.docx" } } } ``` ### Example Usage with Roo ```json { "file_path": "/path/to/document.docx" } ``` ## Supported Features - ✅ **Text Extraction**: Plain text with word counting - ✅ **Rich Formatting**: Bold, italic, underline, strikethrough - ✅ **Document Structure**: Headings (H1-H6), paragraphs - ✅ **Lists**: Ordered and unordered lists with items - ✅ **Images**: Extraction as base64 or file export - ✅ **Tables**: Basic table structure (via HTML conversion) - ✅ **Links**: Hyperlinks preservation - ✅ **Styles**: Custom style mapping support - ✅ **Error Handling**: Comprehensive error reporting - ✅ **Multiple Formats**: HTML, Markdown, plain text output ## Advanced Features ### Custom Style Mapping The `convert_to_html` tool supports custom style mapping for better semantic HTML output: ```javascript // Example style mappings "p[style-name='Heading 1'] => h1:fresh" "r[style-name='Strong'] => strong" "r[style-name='Emphasis'] => em" ``` ### Image Handling - **Base64 Embedding**: Images can be embedded as base64 data URLs - **File Export**: Images can be extracted to a specified directory - **Metadata**: Alt text and content type preservation ### Document Analysis Provides comprehensive document analysis including: - Character and word counts - Paragraph and heading counts - Formatting element statistics - Document structure hierarchy ## Development Install dependencies: ```bash npm install ``` Build the server: ```bash npm run build ``` For development with auto-rebuild: ```bash npm run watch ``` ## Installation for Claude Desktop To use with Claude Desktop, add the server config: On MacOS: `~/Library/Application Support/Claude/claude_desktop_config.json` On Windows: `%APPDATA%/Claude/claude_desktop_config.json` ```json { "mcpServers": { "docx-format-server": { "command": "/path/to/docx-format-server/build/index.js" } } } ``` ## Dependencies - `@modelcontextprotocol/sdk`: MCP protocol implementation - `mammoth`: Advanced DOCX processing library - `zod`: Schema validation - `typescript`: TypeScript support ## Error Handling All tools include comprehensive error handling with detailed error messages for: - File not found errors - Invalid file format - Processing errors - Permission issues ## Debugging Since MCP servers communicate over stdio, debugging can be challenging. We recommend using the [MCP Inspector](https://github.com/modelcontextprotocol/inspector), which is available as a package script: ```bash npm run inspector ``` The Inspector will provide a URL to access debugging tools in your browser. ## Version History - **v0.2.0**: Complete rewrite with mammoth library, added 5 comprehensive tools - **v0.1.0**: Basic text extraction with docx-parser (deprecated) ## License ISC License

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/zeph-gh/Docx-Mcp-Server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server