# QuantConnect PDF MCP Server
An advanced Model Context Protocol (MCP) server that provides intelligent search and retrieval capabilities for QuantConnect PDF documentation. This server converts PDFs to searchable markdown format and provides fast, context-aware search using TF-IDF scoring and proximity matching.
## Features
- **Intelligent PDF Processing**: Automatically converts PDFs to structured markdown with proper formatting
- **Fast Search Index**: Uses inverted index with TF-IDF scoring for relevant results
- **Context-Aware Results**: Returns relevant excerpts with highlighted matches
- **Caching System**: Avoids reprocessing unchanged PDFs for better performance
- **Proximity Matching**: Boosts results where query terms appear close together
- **Three MCP Tools**: Search, list documents, and retrieve full content
## Project Structure
```
QuantConnectServer/
├── server.py # Main MCP server with enhanced search
├── convert_pdfs.py # Standalone PDF conversion utility
├── requirements.txt # Python dependencies
├── README.md # This documentation
├── env/ # Python virtual environment
└── quantconnect-docs/ # PDF documents and converted markdown
├── Quantconnect-Local-Platform-Python-2.pdf
├── Quantconnect-Writing-Algorithms-Python-2.pdf
└── markdown/ # Auto-generated markdown files
├── .pdf_cache.json # Processing cache
├── .search_index.pkl # Search index cache
└── *.md files # Converted documents
```
## Installation
### Prerequisites
- Python 3.8 or higher
- pip package manager
### Step 1: Install Dependencies
Install required packages:
```bash
pip install -r requirements.txt
```
The `requirements.txt` includes:
- `mcp` - Model Context Protocol library
- `PyPDF2` - PDF text extraction
- `asyncio` - Asynchronous processing
### Step 2: Prepare Your Environment
Create a virtual environment (recommended):
```bash
python -m venv env
source env/bin/activate # On Windows: env\Scripts\activate
pip install -r requirements.txt
```
## Configuration
### Claude Desktop Setup
Find your Claude Desktop configuration file:
- **macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`
- **Windows**: `%APPDATA%\Claude\claude_desktop_config.json`
- **Linux**: `~/.config/claude/claude_desktop_config.json`
Add this configuration (adjust paths to match your system):
```json
{
"mcpServers": {
"quantconnect-pdf-server": {
"command": "/path/to/your/project/env/bin/python3",
"args": ["/path/to/your/project/server.py"],
"env": {
"QUANTCONNECT_PDF_FOLDER": "/path/to/your/project/quantconnect-docs",
"QUANTCONNECT_MARKDOWN_FOLDER": "/path/to/your/project/quantconnect-docs/markdown"
}
}
}
}
```
### Environment Variables
- `QUANTCONNECT_PDF_FOLDER`: Directory containing your PDF files (required)
- `QUANTCONNECT_MARKDOWN_FOLDER`: Directory for converted markdown files (optional, defaults to `PDF_FOLDER/markdown`)
## Usage
### Starting the Server
1. **Standalone testing**:
```bash
export QUANTCONNECT_PDF_FOLDER="/path/to/your/pdfs"
python server.py
```
2. **With Claude Desktop**: Restart Claude Desktop after configuration to load the MCP server
3. **Manual PDF conversion** (optional):
```bash
python convert_pdfs.py [pdf_folder] [markdown_folder]
```
### Testing the Integration
Test in Claude by asking:
- "Can you list the available QuantConnect documents?"
- "Search for information about backtesting in the QuantConnect docs"
- "What does the QuantConnect documentation say about indicators?"
- "Show me page 5 of the Local Platform documentation"
## Available MCP Tools
The server provides three powerful tools accessible through Claude:
### 1. `search_quantconnect_docs`
**Purpose**: Intelligent search through all QuantConnect documentation
**Parameters**:
- `query` (required): Search terms or topic to find
- `max_results` (optional): Number of results to return (default: 5)
**Features**:
- TF-IDF scoring for relevance ranking
- Proximity matching for multi-word queries
- Context extraction with highlighted matches
- Returns document excerpts with page numbers
### 2. `list_quantconnect_docs`
**Purpose**: List all available PDF documents in the collection
**Parameters**: None
**Returns**: Complete catalog of processed documents with metadata
### 3. `get_document_content`
**Purpose**: Retrieve full content from specific documents
**Parameters**:
- `filename` (required): Document name (with or without .md extension)
- `page_number` (optional): Specific page to retrieve
**Use cases**: Reading complete sections, accessing specific pages, extracting code examples
## Technical Architecture
### Search Engine
- **Inverted Index**: Maps words to document locations for fast lookup
- **TF-IDF Scoring**: Balances term frequency with document rarity
- **Proximity Boosting**: Enhances results where query terms appear together
- **Context Extraction**: Provides relevant snippets around matches
### Caching System
- **PDF Processing Cache**: Avoids reprocessing unchanged files using MD5 hashes
- **Search Index Cache**: Persists search index for faster startup
- **Incremental Updates**: Only processes new or modified PDFs
### Performance Features
- **Asynchronous Processing**: Non-blocking PDF conversion and indexing
- **Background Initialization**: Server starts immediately while processing continues
- **Efficient Storage**: Markdown conversion reduces memory usage vs. raw PDF text
## Troubleshooting
### Common Issues
1. **Server not connecting**
- Verify absolute paths in Claude Desktop configuration
- Check Python virtual environment activation
- Ensure `server.py` has execute permissions
2. **PDFs not loading**
- Confirm `QUANTCONNECT_PDF_FOLDER` path exists
- Check PDF file permissions and readability
- Look for error messages in server output
3. **Search returning no results**
- Wait for initial PDF processing to complete
- Check if markdown files were created successfully
- Try broader search terms
4. **Performance issues**
- Ensure adequate disk space for markdown files
- Check if antivirus is scanning the project folder
- Consider moving cache files to faster storage
### Debug Mode
Run the server with debug output:
```bash
export QUANTCONNECT_PDF_FOLDER="/path/to/pdfs"
python server.py 2>&1 | tee server.log
```
## Advanced Usage
### Bulk PDF Processing
Process all PDFs without starting the server:
```bash
python convert_pdfs.py ./quantconnect-docs ./quantconnect-docs/markdown
```
### Custom Search Queries
The search supports various query types:
- **Single terms**: `backtesting`
- **Multi-word queries**: `custom indicator development`
- **Technical terms**: `OnData event handler`
- **Code concepts**: `Algorithm.Initialize method`
### Integration Examples
Ask Claude sophisticated questions like:
```
"Using the QuantConnect docs, show me step-by-step how to create a custom indicator with examples"
"What are all the different order types available and when should I use each one?"
"Find code examples of universe selection and explain the different approaches"
"Compare the local platform setup process with cloud deployment according to the documentation"
```
## Contributing
To extend the server:
1. **Add new document formats**: Extend the conversion system in `server.py:236`
2. **Improve search**: Enhance the `SearchIndex` class for semantic search
3. **Add specialized tools**: Create domain-specific search functions
4. **Performance optimization**: Implement parallel processing or database storage
## Version History
- **v0.3.0**: Enhanced search with TF-IDF scoring and proximity matching
- **v0.2.0**: Added caching system and background processing
- **v0.1.0**: Basic PDF to markdown conversion and simple search