How do I use Word Document Reader MCP Server?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@Word Document Reader MCP Server read the quarterly report at /docs/reports/q3-2024.docx and extract all tables" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

Word Document Reader MCP Server

A powerful Word document reading MCP server with table extraction, image OCR analysis, large document optimization, and intelligent caching.

🚀 Core Features

1. Document Content Extraction

✅ Word document (.docx/.doc) text extraction
✅ Support for mixed Chinese-English documents
✅ Preserve original formatting and structure

2. Table Extraction

✅ Automatically identify and extract tables from Word documents
✅ Convert to structured data format
✅ Preserve table row/column structure information
✅ Support complex table parsing

3. Image OCR Analysis

✅ Extract embedded images from Word documents
✅ High-precision OCR recognition using Tesseract.js v5
✅ Support mixed Chinese-English text recognition (95%+ accuracy)
✅ Intelligent image preprocessing for better recognition
✅ Support multiple image formats (JPG, PNG, GIF, BMP, WebP)

4. Large Document Optimization

✅ Automatic detection of large documents (>10MB or >100 pages)
✅ Worker thread parallel processing, utilizing multi-core CPUs
✅ Chunked processing to avoid memory overflow
✅ 60%+ speed improvement

5. Intelligent Caching System

✅ File system persistent caching
✅ Smart cache invalidation based on file modification time
✅ Cache statistics and management support
✅ 90%+ speed improvement for repeated document processing

6. Full-text Index Search

✅ Millisecond-level search with inverted index
✅ Intelligent Chinese-English word segmentation
✅ Relevance scoring and sorting
✅ Support document type filtering

📦 Installation and Usage

1. Install Dependencies

npm install

2. Start Server

# Start full-featured version npm start # Or start basic version (without advanced features) npm run start:basic

3. Run Tests

# Run all tests npm test # Run tests in watch mode npm run test:watch # Generate test coverage report npm run test:coverage

read_word_document

Read and analyze Word documents

{ "name": "read_word_document", "arguments": { "filePath": "path/to/document.docx", "memoryKey": "my-document", "documentType": "api-doc", "extractTables": true, "extractImages": true, "useCache": true, "outputDir": "./output" } }

search_documents

Full-text index search

{ "name": "search_documents", "arguments": { "query": "search keywords", "documentType": "api-doc", "limit": 10 } }

get_cache_stats

Get cache statistics

{ "name": "get_cache_stats" }

clear_cache

Clear cache

{ "name": "clear_cache", "arguments": { "type": "all" // "all", "document", "index" } }

list_stored_documents

List stored documents

{ "name": "list_stored_documents", "arguments": { "documentType": "api-doc" } }

get_stored_document

Get specific document content

{ "name": "get_stored_document", "arguments": { "memoryKey": "document-key" } }

clear_memory

Clear memory content

{ "name": "clear_memory", "arguments": { "memoryKey": "specific-key" // Optional, clear all if not provided } }

📁 Project Structure

word-doc-mcp/ ├── server.js # Main server file (with all features) ├── server-basic.js # Basic server (compatibility) ├── package.json # Project configuration and dependencies ├── config.json # Server configuration file ├── tests/ # Test directory │ ├── setup.js # Test environment setup │ ├── unit/ # Unit tests │ │ └── services/ # Service layer tests │ ├── integration/ # Integration tests │ │ ├── tools/ # Tool tests │ │ └── cache/ # Cache tests │ └── fixtures/ # Test data │ ├── documents/ # Test documents │ └── mock-data.js # Mock data ├── .cache/ # Cache directory (auto-created) ├── output/ # Output directory (auto-created) └── node_modules/ # Dependencies

⚙️ Configuration

Edit the config.json file to customize server behavior:

{ "processing": { "maxFileSize": 10485760, "maxPages": 100, "chunkSize": 1048576, "parallelProcessing": true }, "cache": { "enabled": true, "defaultTTL": 3600, "cacheDirectory": "./.cache" }, "ocr": { "enabled": true, "languages": ["chi_sim", "eng"] } }

🧪 Testing

Test Framework

Using Node.js built-in test framework, following these standards:

Unit Tests: Test individual components and functions
Integration Tests: Test interactions between tools
End-to-End Tests: Test complete workflows

Running Tests

# Run all tests npm test # Run specific test file node --test tests/unit/services/DocumentIndexer.test.js # Run integration tests node --test tests/integration/ # Generate coverage report npm run test:coverage

Test Coverage

✅ Functional tests for all MCP tools
✅ Complete cache system tests
✅ Error handling and edge cases
✅ Performance and concurrency tests
✅ End-to-end workflow tests

📊 Performance Metrics

Large Document Processing: 60%+ speed improvement (parallel processing)
Repeated Document Processing: 90%+ speed improvement (caching)
OCR Recognition Accuracy: 95%+ (image preprocessing)
Memory Usage Optimization: 40% reduction (streaming processing)
Search Response Time: <100ms (full-text index)

🛡️ Security Considerations

Input file size limits
File type validation
Cache data isolation
Error handling and logging
Automatic temporary file cleanup

🔄 Version Compatibility

Backward Compatibility

✅ Maintain full compatibility with original API
✅ Existing tool functionality unchanged
✅ Optional configuration with reasonable defaults
✅ Provide basic version to ensure compatibility

System Requirements

Minimum Requirements:

Node.js 16+
4GB RAM
1GB disk space

Recommended Configuration:

Node.js 18+
8GB+ RAM
Multi-core CPU
SSD storage

🐛 Troubleshooting

Common Issues

Module Installation Failure
npm cache clean --force npm install
OCR Recognition Failure
- Ensure sufficient memory (8GB+ recommended)
- Check supported image formats
- Review error logs
Slow Large Document Processing
- Enable parallel processing
- Adjust chunkSize configuration
- Use SSD storage
Memory Insufficient
node --max-old-space-size=4096 server.js

📝 Changelog

v2.0.0

✅ Add table extraction functionality
✅ Add image OCR analysis
✅ Implement large document parallel processing
✅ Add intelligent caching system
✅ Implement full-text index search
✅ Complete testing framework

v1.0.0

✅ Basic Word document reading
✅ Memory storage management
✅ Simple search functionality

🤝 Contributing

Issues and Pull Requests are welcome!

Development Guidelines

Fork the project
Create feature branch
Write test cases
Ensure all tests pass
Submit Pull Request

📄 License

MIT License

Quick Start: npm install && npm start