Built on Node.js runtime environment, requiring Node.js 16+ for execution of the MCP server and its RAG capabilities
Uses npm package manager for dependency management and installation of required libraries for vector search and document processing
Integrates Sharp for image manipulation and processing as part of the OCR and document extraction pipeline
Calibre RAG MCP Server
Enhanced Calibre MCP server with RAG (Retrieval-Augmented Generation) capabilities for project-based vector search and contextual conversations.
Features
RAG-Enhanced Search: Vector-based semantic search using FAISS and Transformers
Project-Based Organization: Create isolated vector search projects for different contexts
Multi-Format Support: Process books in various formats (EPUB, PDF, MOBI, etc.)
OCR Capabilities: Extract text from images and scanned PDFs using Tesseract
Advanced Text Processing: Natural language processing for better content understanding
Windows Compatible: Designed specifically for Windows environments
Technologies Used
Vector Search: FAISS for efficient similarity search
Embeddings: Xenova Transformers for local embedding generation
OCR: Tesseract for optical character recognition
PDF Processing: Multiple PDF parsing libraries (pdf-parse, pdf-poppler, pdf2pic)
Image Processing: Sharp for image manipulation
NLP: Natural language processing with multiple libraries
Prerequisites
Node.js >= 16.0.0
Calibre installed on Windows
ImageMagick (for enhanced image processing)
Tesseract OCR (for text extraction from images)
Installation
Clone this repository:
Install dependencies:
Run setup (Windows):
Configuration
The server automatically detects your Calibre library location. For custom configurations, modify the settings in server.js
.
Usage
Starting the Server
Available Tools
search
: Semantic search across your ebook libraryfetch
: Retrieve specific content from bookslist_projects
: List all RAG projectscreate_project
: Create a new RAG projectadd_books_to_project
: Add books to a project for vectorizationsearch_project_context
: Search within specific projects
Example MCP Configuration
Add to your MCP client configuration:
Project Structure
Testing
Run the test suite:
Individual test files:
test-enhanced-server.js
- Enhanced server functionalitytest-ocr-full.js
- OCR capabilitiestest-pdf-approaches.js
- PDF processingtest-enhanced-auto.js
- Automated testing
Documentation
Requirements
System Requirements
Windows 10/11
Node.js 16+
Calibre installed
At least 4GB RAM (8GB+ recommended for large libraries)
Optional Dependencies
ImageMagick (for enhanced image processing)
Tesseract OCR (for text extraction from scanned documents)
Troubleshooting
Common Issues
FAISS Installation: If FAISS fails to install, ensure you have proper build tools
Tesseract Not Found: Install Tesseract and add to PATH
Memory Issues: Reduce batch sizes for large document processing
Debug Mode
Enable verbose logging by setting environment variable:
Contributing
Fork the repository
Create a feature branch
Make your changes
Add tests for new functionality
Submit a pull request
License
Licensed under the Apache License 2.0. See LICENSE file for details.
Support
For issues and questions, please open an issue on GitHub.
Changelog
v1.0.0
Initial release with RAG capabilities
Project-based vector search
Multi-format document support
OCR integration
Windows optimization
local-only server
The server can only run on the client's local machine because it depends on local resources.
Tools
Enables semantic search and contextual conversations with your Calibre ebook library using vector-based RAG technology. Supports project-based organization, multi-format book processing, and OCR capabilities for enhanced content extraction and retrieval.