Converts eFax documents from OpenText Fax Server Software into structured JSON format, supporting PDF, TIFF, and CCD XML formats with OCR and metadata extraction capabilities
eFax to JSON MCP Server
A Model Context Protocol (MCP) server that converts eFax documents from OpenText Fax Server Software into structured JSON format. Supports PDF, TIFF, and CCD XML document formats with advanced OCR and metadata extraction capabilities.
Features
Supported Formats
PDF Documents - Text extraction and OCR for scanned PDFs
TIFF Images - Multi-page TIFF support with OCR processing
CCD XML - Clinical Document Architecture parsing
Processing Capabilities
Intelligent OCR - Tesseract-based text recognition with confidence scoring
Metadata Extraction - Preserve document properties and fax information
Batch Processing - Convert multiple documents simultaneously
Format Validation - Comprehensive document structure validation
Error Recovery - Robust error handling with detailed reporting
Installation
Prerequisites
Node.js 18+
System-level Tesseract OCR installation:
Ubuntu/Debian:
sudo apt-get install tesseract-ocrmacOS:
brew install tesseractWindows: Download from UB Mannheim releases
Setup Steps
Create project directory
mkdir efax-mcp-server cd efax-mcp-serverInitialize and install dependencies
npm init -y npm install @modelcontextprotocol/sdk pdf-parse sharp tesseract.js xml2js npm install -D @types/node @types/pdf-parse @types/xml2js typescript ts-nodeCreate directory structure
mkdir -p src/{types,processors,utils} mkdir -p tests/test-files mkdir -p docsAdd source files (paste the provided code into respective files)
Build the project
npm run build
Usage
MCP Client Configuration
Add to your MCP client configuration (e.g., Claude Desktop):
Available Tools
1. Convert Single Document
Parameters:
filePath(required) - Path to eFax documentoutputPath(optional) - Custom output JSON pathextractMetadata(default: true) - Extract document metadataperformOCR(default: true) - Enable OCR processingocrLanguage(default: "eng") - OCR language codeincludeRawData(default: false) - Include raw document data
2. Batch Convert Documents
Parameters:
inputDirectory(required) - Source document directoryoutputDirectory(required) - JSON output directoryfilePattern(default: "*") - File matching patterncontinueOnError(default: true) - Continue on individual failures
3. Validate JSON Output
4. Get File Information
5. List Supported Formats
JSON Output Structure
Architecture
Modular Design
Processors: Format-specific conversion logic
Utilities: Shared validation and file handling
Types: Comprehensive TypeScript definitions
Processing Pipeline
File Validation - Format and size checks
Format Detection - Automatic type identification
Content Extraction - Text and metadata processing
OCR Processing - Image-to-text conversion when needed
Structure Validation - Output quality assurance
JSON Serialization - Standardized output format
Development
Build Commands
Testing
Place sample documents in tests/test-files/ and run:
Adding New Formats
Create processor in
src/processors/Add type definitions in
src/types/Register in main server
Update documentation
Performance Considerations
OCR Processing: CPU-intensive, consider batch size limits
Memory Usage: Large TIFF files may require significant RAM
Processing Time: Varies by document complexity and OCR requirements
Concurrent Processing: Single-threaded OCR worker per instance
Error Handling
The server provides comprehensive error handling:
File Validation Errors - Invalid paths, unsupported formats
Processing Errors - OCR failures, corrupted documents
System Errors - Memory issues, disk space problems
Validation Errors - Output structure problems
Troubleshooting
Common Issues
OCR Not Working
Verify Tesseract installation:
tesseract --versionCheck language pack availability
Ensure sufficient system memory
Large File Processing
Monitor memory usage during conversion
Consider breaking large batches into smaller chunks
Verify available disk space for output
Permission Errors
Check read permissions on input files
Verify write permissions on output directory
Ensure MCP server has appropriate file system access
License
MIT License - see LICENSE file for details.
Support
For issues and feature requests, please use the project's issue tracker.