CONFIG.md•7.93 kB
# Enhanced Calibre RAG MCP Server - Configuration Guide
## Directory Structure
```
calibre-rag-mcp-nodejs/
├── package.json # Dependencies and project info
├── server.js # Main server implementation
├── test.js # Test suite
├── setup.bat # Windows setup script
├── README.md # Main documentation
├── USAGE_EXAMPLES.md # Usage examples
├── CONFIG.md # This configuration guide
├── node_modules/ # Installed dependencies (after npm install)
└── projects/ # RAG projects directory (created automatically)
├── project_name_1/
│ ├── project.json # Project configuration
│ ├── vectors.bin # Binary vector storage
│ ├── metadata.json # Chunk metadata
│ └── chunks/ # Individual text chunks
│ ├── chunk_0.json
│ ├── chunk_1.json
│ └── ...
└── project_name_2/
└── ...
```
## Server Configuration
### Primary Settings (edit in server.js)
```javascript
const CONFIG = {
// REQUIRED: Update this to your Calibre library path
CALIBRE_LIBRARY: 'D:\\e-library', // ← CHANGE THIS
// Calibre executable paths (automatic detection)
CALIBRE_PATHS: [
'calibredb', // If in system PATH
'C:\\Program Files\\Calibre2\\calibredb.exe',
'C:\\Program Files (x86)\\Calibre2\\calibredb.exe',
path.join(os.homedir(), 'AppData', 'Local', 'calibre-ebook', 'calibredb.exe')
],
// RAG Configuration
RAG: {
PROJECTS_DIR: path.join(__dirname, 'projects'),
EMBEDDINGS_MODEL: 'Xenova/all-MiniLM-L6-v2', // Lightweight, good quality
CHUNK_SIZE: 1000, // Characters per chunk
CHUNK_OVERLAP: 200, // Overlap between chunks
VECTOR_DIMENSION: 384, // Embedding dimension
MAX_CONTEXT_CHUNKS: 5 // Max chunks returned per query
}
};
```
### Finding Your Calibre Library Path
1. **Open Calibre**
2. **Go to:** Preferences → Interface → Set Library Location
3. **Copy the path** and update `CALIBRE_LIBRARY` in server.js
Common paths:
- `C:\\Users\\YourName\\Documents\\Calibre Library`
- `D:\\Books\\Calibre Library`
- `E:\\Library\\Calibre`
### Alternative Embedding Models
For different use cases, you can change the `EMBEDDINGS_MODEL`:
```javascript
// Lightweight (384 dim) - Good for general use
EMBEDDINGS_MODEL: 'Xenova/all-MiniLM-L6-v2',
// Better quality (768 dim) - Requires more memory
EMBEDDINGS_MODEL: 'Xenova/all-mpnet-base-v2',
// Multilingual (384 dim) - For non-English content
EMBEDDINGS_MODEL: 'Xenova/multilingual-e5-small',
// Technical/Scientific (768 dim) - Better for technical content
EMBEDDINGS_MODEL: 'Xenova/scibert_scivocab_uncased',
```
### Performance Tuning
#### For Large Libraries (10,000+ books):
```javascript
RAG: {
CHUNK_SIZE: 800, // Smaller chunks
CHUNK_OVERLAP: 150, // Less overlap
MAX_CONTEXT_CHUNKS: 3 // Fewer chunks per query
}
```
#### For Technical/Engineering Content:
```javascript
RAG: {
CHUNK_SIZE: 1200, // Larger chunks for complete formulas
CHUNK_OVERLAP: 250, // More overlap for context
MAX_CONTEXT_CHUNKS: 5 // More chunks for complex topics
}
```
#### For Low-Memory Systems:
```javascript
RAG: {
EMBEDDINGS_MODEL: 'Xenova/all-MiniLM-L6-v2', // Lightest model
CHUNK_SIZE: 600, // Smaller chunks
MAX_CONTEXT_CHUNKS: 3 // Fewer chunks
}
```
## Project Configuration Example
Each project has a `project.json` file:
```json
{
"name": "steel_building_design",
"description": "Steel building design references including AISC standards",
"created_at": "2024-09-08T10:30:00.000Z",
"last_updated": "2024-09-08T11:45:00.000Z",
"books": [123, 456, 789],
"chunk_count": 1547,
"vector_dimension": 384,
"settings": {
"chunk_size": 1000,
"chunk_overlap": 200,
"embedding_model": "Xenova/all-MiniLM-L6-v2"
}
}
```
## Environment Variables (Optional)
Create a `.env` file for sensitive settings:
```bash
# .env file (optional)
CALIBRE_LIBRARY_PATH=D:\e-library
EMBEDDINGS_CACHE_DIR=C:\Users\YourName\AppData\Local\calibre-rag-cache
LOG_LEVEL=info
```
Then load in server.js:
```javascript
require('dotenv').config();
const CALIBRE_LIBRARY = process.env.CALIBRE_LIBRARY_PATH || 'D:\\e-library';
```
## Claude Integration Configuration
### MCP Client Setup
Add to your Claude configuration:
```json
{
"mcpServers": {
"calibre-rag": {
"command": "node",
"args": ["F:\\MCP servers\\calibre-rag-mcp-nodejs\\server.js"],
"cwd": "F:\\MCP servers\\calibre-rag-mcp-nodejs"
}
}
}
```
### Usage Patterns
1. **Project-First Approach:**
```
Human: Create a RAG project for structural steel design
Claude: [Creates project, searches books, adds to project]
Human: What are the AISC requirements for beam design?
Claude: [Uses project context to provide detailed answer]
```
2. **Search-First Approach:**
```
Human: Search for books about concrete bridge design
Claude: [Searches library, shows results]
Human: Add books 123, 456 to a new bridge design project
Claude: [Creates project and processes books]
```
## Monitoring and Maintenance
### Log Files
- Location: `%TEMP%\calibre-rag-mcp-requests.log`
- Rotation: Manual (delete when too large)
- Level: All requests and errors
### Storage Management
```bash
# Check project sizes
dir projects /s
# Clean up old projects (manual)
rmdir /s projects\old_project_name
```
### Performance Monitoring
- Vector file sizes (projects/*/vectors.bin)
- Chunk counts per project
- Response times for context queries
## Troubleshooting Configuration
### Common Issues
1. **"Calibre not found"**
- Check CALIBRE_LIBRARY path
- Verify Calibre installation
- Add calibredb.exe to system PATH
2. **"Module not found" errors**
- Run: `npm install`
- Check Node.js version (16+)
- Clear node_modules and reinstall
3. **Embedding download fails**
- Check internet connection
- Clear cache: Delete `~/.cache/huggingface`
- Try different embedding model
4. **Out of memory errors**
- Reduce CHUNK_SIZE
- Reduce MAX_CONTEXT_CHUNKS
- Use lighter embedding model
5. **Slow performance**
- Check vector file sizes
- Reduce chunk overlap
- Limit books per project
### Debug Mode
Enable detailed logging:
```javascript
const DEBUG = true; // Add to CONFIG object
// In log method:
if (DEBUG) {
console.log(logMessage);
}
```
### Validation Commands
```bash
# Test Calibre connection
calibredb list --limit 5
# Test Node.js and dependencies
node -e "console.log(process.version)"
npm list
# Test MCP server
node test.js
```
## Advanced Configuration
### Custom Chunking Strategy
Modify the `intelligentChunk` method for domain-specific content:
```javascript
// For legal documents
const legalChunk = (content) => {
// Split by sections, subsections, paragraphs
// Preserve legal citations
// Maintain clause structure
};
// For programming books
const codeChunk = (content) => {
// Keep code blocks intact
// Preserve function definitions
// Maintain import statements
};
```
### Custom Embedding Pipeline
```javascript
async initializeCustomEmbedder() {
// Use domain-specific models
// Add preprocessing steps
// Implement custom tokenization
}
```
### Integration with External Vector Databases
```javascript
// FAISS integration
const faiss = require('faiss-node');
// Pinecone integration
const pinecone = require('@pinecone-database/pinecone');
// Weaviate integration
const weaviate = require('weaviate-ts-client');
```