Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Word Document Reader MCP Serverread the quarterly report at /docs/reports/q3-2024.docx and extract all tables"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Word Document Reader MCP Server
A powerful Word document reading MCP server with table extraction, image OCR analysis, large document optimization, and intelligent caching.
๐ Core Features
1. Document Content Extraction
โ Word document (.docx/.doc) text extraction
โ Support for mixed Chinese-English documents
โ Preserve original formatting and structure
2. Table Extraction
โ Automatically identify and extract tables from Word documents
โ Convert to structured data format
โ Preserve table row/column structure information
โ Support complex table parsing
3. Image OCR Analysis
โ Extract embedded images from Word documents
โ High-precision OCR recognition using Tesseract.js v5
โ Support mixed Chinese-English text recognition (95%+ accuracy)
โ Intelligent image preprocessing for better recognition
โ Support multiple image formats (JPG, PNG, GIF, BMP, WebP)
4. Large Document Optimization
โ Automatic detection of large documents (>10MB or >100 pages)
โ Worker thread parallel processing, utilizing multi-core CPUs
โ Chunked processing to avoid memory overflow
โ 60%+ speed improvement
5. Intelligent Caching System
โ File system persistent caching
โ Smart cache invalidation based on file modification time
โ Cache statistics and management support
โ 90%+ speed improvement for repeated document processing
6. Full-text Index Search
โ Millisecond-level search with inverted index
โ Intelligent Chinese-English word segmentation
โ Relevance scoring and sorting
โ Support document type filtering
๐ฆ Installation and Usage
1. Install Dependencies
2. Start Server
3. Run Tests
read_word_document
Read and analyze Word documents
search_documents
Full-text index search
get_cache_stats
Get cache statistics
clear_cache
Clear cache
list_stored_documents
List stored documents
get_stored_document
Get specific document content
clear_memory
Clear memory content
๐ Project Structure
โ๏ธ Configuration
Edit the config.json file to customize server behavior:
๐งช Testing
Test Framework
Using Node.js built-in test framework, following these standards:
Unit Tests: Test individual components and functions
Integration Tests: Test interactions between tools
End-to-End Tests: Test complete workflows
Running Tests
Test Coverage
โ Functional tests for all MCP tools
โ Complete cache system tests
โ Error handling and edge cases
โ Performance and concurrency tests
โ End-to-end workflow tests
๐ Performance Metrics
Large Document Processing: 60%+ speed improvement (parallel processing)
Repeated Document Processing: 90%+ speed improvement (caching)
OCR Recognition Accuracy: 95%+ (image preprocessing)
Memory Usage Optimization: 40% reduction (streaming processing)
Search Response Time: <100ms (full-text index)
๐ก๏ธ Security Considerations
Input file size limits
File type validation
Cache data isolation
Error handling and logging
Automatic temporary file cleanup
๐ Version Compatibility
Backward Compatibility
โ Maintain full compatibility with original API
โ Existing tool functionality unchanged
โ Optional configuration with reasonable defaults
โ Provide basic version to ensure compatibility
System Requirements
Minimum Requirements:
Node.js 16+
4GB RAM
1GB disk space
Recommended Configuration:
Node.js 18+
8GB+ RAM
Multi-core CPU
SSD storage
๐ Troubleshooting
Common Issues
Module Installation Failure
npm cache clean --force npm installOCR Recognition Failure
Ensure sufficient memory (8GB+ recommended)
Check supported image formats
Review error logs
Slow Large Document Processing
Enable parallel processing
Adjust chunkSize configuration
Use SSD storage
Memory Insufficient
node --max-old-space-size=4096 server.js
๐ Changelog
v2.0.0
โ Add table extraction functionality
โ Add image OCR analysis
โ Implement large document parallel processing
โ Add intelligent caching system
โ Implement full-text index search
โ Complete testing framework
v1.0.0
โ Basic Word document reading
โ Memory storage management
โ Simple search functionality
๐ค Contributing
Issues and Pull Requests are welcome!
Development Guidelines
Fork the project
Create feature branch
Write test cases
Ensure all tests pass
Submit Pull Request
๐ License
MIT License
Quick Start: npm install && npm start