Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Word Document Reader MCP Serverread the quarterly report at /docs/reports/q3-2024.docx and extract all tables"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Word Document Reader MCP Server
A powerful Word document reading MCP server with table extraction, image OCR analysis, large document optimization, and intelligent caching.
π Core Features
1. Document Content Extraction
β Word document (.docx/.doc) text extraction
β Support for mixed Chinese-English documents
β Preserve original formatting and structure
2. Table Extraction
β Automatically identify and extract tables from Word documents
β Convert to structured data format
β Preserve table row/column structure information
β Support complex table parsing
3. Image OCR Analysis
β Extract embedded images from Word documents
β High-precision OCR recognition using Tesseract.js v5
β Support mixed Chinese-English text recognition (95%+ accuracy)
β Intelligent image preprocessing for better recognition
β Support multiple image formats (JPG, PNG, GIF, BMP, WebP)
4. Large Document Optimization
β Automatic detection of large documents (>10MB or >100 pages)
β Worker thread parallel processing, utilizing multi-core CPUs
β Chunked processing to avoid memory overflow
β 60%+ speed improvement
5. Intelligent Caching System
β File system persistent caching
β Smart cache invalidation based on file modification time
β Cache statistics and management support
β 90%+ speed improvement for repeated document processing
6. Full-text Index Search
β Millisecond-level search with inverted index
β Intelligent Chinese-English word segmentation
β Relevance scoring and sorting
β Support document type filtering
π¦ Installation and Usage
1. Install Dependencies
npm install2. Start Server
# Start full-featured version
npm start
# Or start basic version (without advanced features)
npm run start:basic3. Run Tests
# Run all tests
npm test
# Run tests in watch mode
npm run test:watch
# Generate test coverage report
npm run test:coverageread_word_document
Read and analyze Word documents
{
"name": "read_word_document",
"arguments": {
"filePath": "path/to/document.docx",
"memoryKey": "my-document",
"documentType": "api-doc",
"extractTables": true,
"extractImages": true,
"useCache": true,
"outputDir": "./output"
}
}search_documents
Full-text index search
{
"name": "search_documents",
"arguments": {
"query": "search keywords",
"documentType": "api-doc",
"limit": 10
}
}get_cache_stats
Get cache statistics
{
"name": "get_cache_stats"
}clear_cache
Clear cache
{
"name": "clear_cache",
"arguments": {
"type": "all" // "all", "document", "index"
}
}list_stored_documents
List stored documents
{
"name": "list_stored_documents",
"arguments": {
"documentType": "api-doc"
}
}get_stored_document
Get specific document content
{
"name": "get_stored_document",
"arguments": {
"memoryKey": "document-key"
}
}clear_memory
Clear memory content
{
"name": "clear_memory",
"arguments": {
"memoryKey": "specific-key" // Optional, clear all if not provided
}
}π Project Structure
word-doc-mcp/
βββ server.js # Main server file (with all features)
βββ server-basic.js # Basic server (compatibility)
βββ package.json # Project configuration and dependencies
βββ config.json # Server configuration file
βββ tests/ # Test directory
β βββ setup.js # Test environment setup
β βββ unit/ # Unit tests
β β βββ services/ # Service layer tests
β βββ integration/ # Integration tests
β β βββ tools/ # Tool tests
β β βββ cache/ # Cache tests
β βββ fixtures/ # Test data
β βββ documents/ # Test documents
β βββ mock-data.js # Mock data
βββ .cache/ # Cache directory (auto-created)
βββ output/ # Output directory (auto-created)
βββ node_modules/ # DependenciesβοΈ Configuration
Edit the config.json file to customize server behavior:
{
"processing": {
"maxFileSize": 10485760,
"maxPages": 100,
"chunkSize": 1048576,
"parallelProcessing": true
},
"cache": {
"enabled": true,
"defaultTTL": 3600,
"cacheDirectory": "./.cache"
},
"ocr": {
"enabled": true,
"languages": ["chi_sim", "eng"]
}
}π§ͺ Testing
Test Framework
Using Node.js built-in test framework, following these standards:
Unit Tests: Test individual components and functions
Integration Tests: Test interactions between tools
End-to-End Tests: Test complete workflows
Running Tests
# Run all tests
npm test
# Run specific test file
node --test tests/unit/services/DocumentIndexer.test.js
# Run integration tests
node --test tests/integration/
# Generate coverage report
npm run test:coverageTest Coverage
β Functional tests for all MCP tools
β Complete cache system tests
β Error handling and edge cases
β Performance and concurrency tests
β End-to-end workflow tests
π Performance Metrics
Large Document Processing: 60%+ speed improvement (parallel processing)
Repeated Document Processing: 90%+ speed improvement (caching)
OCR Recognition Accuracy: 95%+ (image preprocessing)
Memory Usage Optimization: 40% reduction (streaming processing)
Search Response Time: <100ms (full-text index)
π‘οΈ Security Considerations
Input file size limits
File type validation
Cache data isolation
Error handling and logging
Automatic temporary file cleanup
π Version Compatibility
Backward Compatibility
β Maintain full compatibility with original API
β Existing tool functionality unchanged
β Optional configuration with reasonable defaults
β Provide basic version to ensure compatibility
System Requirements
Minimum Requirements:
Node.js 16+
4GB RAM
1GB disk space
Recommended Configuration:
Node.js 18+
8GB+ RAM
Multi-core CPU
SSD storage
π Troubleshooting
Common Issues
Module Installation Failure
npm cache clean --force npm installOCR Recognition Failure
Ensure sufficient memory (8GB+ recommended)
Check supported image formats
Review error logs
Slow Large Document Processing
Enable parallel processing
Adjust chunkSize configuration
Use SSD storage
Memory Insufficient
node --max-old-space-size=4096 server.js
π Changelog
v2.0.0
β Add table extraction functionality
β Add image OCR analysis
β Implement large document parallel processing
β Add intelligent caching system
β Implement full-text index search
β Complete testing framework
v1.0.0
β Basic Word document reading
β Memory storage management
β Simple search functionality
π€ Contributing
Issues and Pull Requests are welcome!
Development Guidelines
Fork the project
Create feature branch
Write test cases
Ensure all tests pass
Submit Pull Request
π License
MIT License
Quick Start: npm install && npm start