OfficeReader-MCP
A Model Context Protocol (MCP) server that converts Microsoft Office documents (Word, Excel, PowerPoint) to Markdown format with intelligent image extraction and optimization.
Features
Multi-Format Support: Word (.docx, .doc), Excel (.xlsx, .xls), PowerPoint (.pptx, .ppt)
Intelligent Image Processing: Automatic extraction and optimization with WebP compression
Format Preservation: Maintains document structure including headings, tables, lists, and formatting
Metadata Extraction: Access document properties (author, title, creation date, etc.)
Efficient Caching: Smart caching system for quick reuse of converted documents
Cross-Platform: Works on Windows, macOS, and Linux
Supported Formats
Format | Extensions | Features |
Word |
,
| Text formatting, headings, lists, tables, images |
Excel |
,
| Multi-sheet support, tables, charts, embedded images |
PowerPoint |
,
| Slides, text boxes, images, speaker notes, tables |
Installation
Prerequisites
Python 3.10 or higher
Claude Desktop or Claude Code
Step 1: Install the Package
Step 2: Configure Claude
For Claude Desktop
Add to your Claude Desktop config file:
Windows: %APPDATA%\Claude\claude_desktop_config.json
macOS/Linux: ~/.config/Claude/claude_desktop_config.json
For Claude Code
Add to your Claude Code settings:
Windows: %LOCALAPPDATA%\claude-code\settings.json
macOS/Linux: ~/.config/claude-code/settings.json
Step 3: Restart Claude
Restart Claude Desktop or Claude Code to load the MCP server.
Quick Start
After installation, you can use OfficeReader-MCP directly in your conversations with Claude:
Available Tools
1. convert_document
Convert any supported Office document to Markdown format.
Parameters:
file_path(required): Absolute path to the documentextract_images(optional, default: true): Extract embedded imagesimage_format(optional, default: "file"): How to handle images"file": Save images to disk (recommended)"base64": Embed images as base64 in markdown"both": Both save and embed
output_name(optional): Custom name for output files
Example:
2. read_converted_markdown
Read the full content of a previously converted markdown file.
Parameters:
markdown_path(required): Path to the markdown file
Example:
3. list_conversions
List all cached document conversions with details.
Example:
4. clear_cache
Clear all cached conversions to free up disk space.
Example:
5. get_document_metadata
Extract metadata from a document without full conversion (faster).
Parameters:
file_path(required): Path to the document
Example:
6. get_supported_formats
Get list of all supported file formats and extensions.
Example:
Output Structure
Converted documents are organized in the cache directory:
Image Optimization
Images are automatically optimized to reduce file size while maintaining quality:
Max Dimensions: 1920×1080 pixels (configurable)
Format: WebP (preferred) or PNG/JPEG fallback
Quality: 80% for photos, 85% for JPEG, lossless PNG for graphics with transparency
Typical Compression: 50-80% size reduction
Smart Detection: Automatically distinguishes between photos and graphics
Technical Details
Architecture
Dependencies
Package | Version | Purpose |
| >=1.0.0 | Model Context Protocol SDK |
| >=1.1.0 | DOCX file parsing and manipulation |
| >=1.6.0 | DOC/DOCX to HTML conversion (fallback) |
| >=10.0.0 | Image processing and optimization |
| >=0.11.0 | HTML to Markdown conversion |
| >=3.1.0 | Excel file parsing |
| >=0.6.21 | PowerPoint file parsing |
All dependencies are automatically installed when you run pip install -e .
Testing
Run Tests
Test Coverage
The test suite verifies:
Module imports and initialization
Converter functionality for all formats
Image extraction and optimization
File type detection
Cache management
Metadata extraction
Configuration
OfficeReader-MCP supports multiple configuration methods to customize cache locations and behavior.
Quick Configuration (Recommended)
Copy the example config file:
cp config.example.json config.jsonEdit
config.jsonto set your cache directory:{ "cache_dir": "D:/MyDocuments/OfficeReaderCache", "image_optimization": { "enabled": true, "max_dimension": 1920, "quality": 80 } }The config file will be automatically loaded on startup.
For detailed configuration options, see CONFIG.md.
Environment Variables
Variable | Description | Default |
| Directory for cached conversions | System temp directory |
Example usage:
Note: Environment variables take priority over config file settings.
Usage Examples
Converting Excel with Multiple Sheets
Extracting PowerPoint Content
Batch Processing
Troubleshooting
"Module not found" Error
Configuration Not Loading
Verify the config file location is correct
Check JSON syntax is valid (use a JSON validator)
Restart Claude Desktop or Claude Code completely
Check logs for error messages
Images Not Extracting
Possible causes:
Document contains linked images (not embedded)
Insufficient write permissions for cache directory
Image format not supported by the document library
Solution:
Encoding Issues
The converter uses UTF-8 encoding throughout. If you see garbled text:
Check the source document encoding
Ensure your terminal/console supports UTF-8
Try converting with different system locale settings
Changelog
v2.0.0 (2024-11)
Major Features:
Added Excel (.xlsx, .xls) support with multi-sheet conversion
Added PowerPoint (.pptx, .ppt) support with slide extraction
Implemented intelligent image optimization with WebP compression
Added unified OfficeConverter interface for all document types
Enhanced metadata extraction for all formats
Improvements:
Smart caching system with hash-based file identification
Lazy-loading of format-specific converters for better performance
Better error handling and validation
Comprehensive test suite for all formats
Tools:
Added
get_supported_formatstoolEnhanced
get_document_metadatafor all formatsImproved
list_conversionswith detailed cache information
v1.0.0 (2024-09)
Initial release
Word document (.docx, .doc) conversion
Basic image extraction
MCP server implementation
Contributing
Contributions are welcome! Here's how you can help:
Report Bugs: Open an issue with details and steps to reproduce
Suggest Features: Describe your idea and use case
Submit Pull Requests:
Fork the repository
Create a feature branch (
git checkout -b feature/amazing-feature)Commit your changes (
git commit -m 'Add amazing feature')Push to your branch (
git push origin feature/amazing-feature)Open a Pull Request
Development Setup
License
MIT License - see LICENSE file for details.
Author
Asunainlove
GitHub: @Asunainlove
Repository: office-reader-mcp
Issues: Report a bug
Acknowledgments
This project uses the following open-source libraries:
Model Context Protocol (MCP) by Anthropic
python-docx for Word processing
openpyxl for Excel processing
python-pptx for PowerPoint processing
Pillow for image processing
Support
If you find this project helpful, please:
⭐ Star the repository
🐛 Report bugs and issues
💡 Suggest new features
🔀 Contribute code improvements
Happy converting! 🚀
This server cannot be installed