# Generic PDF MCP Server - Setup Guide
This guide walks you through setting up your own PDF documentation server for use with Claude Desktop.
## π― What You'll Have After Setup
- A customized MCP server that can search through your PDF collection
- Intelligent search capabilities with domain-specific keywords
- Integration with Claude Desktop for natural language queries
- Automated PDF processing and caching for fast performance
## π Prerequisites
- Python 3.8 or higher
- Claude Desktop installed
- PDF documents you want to make searchable
## π Step-by-Step Setup
### Step 1: Install Dependencies
```bash
# Navigate to your project directory
cd /path/to/generic-pdf-server
# Install required Python packages
pip install -r requirements.txt
```
### Step 2: Choose Your Configuration
You have two options:
#### Option A: Interactive Setup (Recommended)
```bash
python manage_server.py create-config
```
This will ask you for:
- Server name (e.g., "my-company-docs")
- Display name (e.g., "My Company Documentation")
- PDF folder location
- Domain-specific keywords
#### Option B: Use an Example Configuration
```bash
# Copy an example config and modify it
cp examples/tech_docs_config.json server_config.json
# Edit the configuration file
# Update paths, keywords, and server details for your use case
```
### Step 3: Prepare Your PDF Collection
```bash
# Create your PDF folder (if it doesn't exist)
mkdir -p /path/to/your/pdfs
# Add PDFs using the management tool
python manage_server.py add-pdf /path/to/document1.pdf
python manage_server.py add-pdf /path/to/document2.pdf
# Or copy PDFs directly to your configured folder
cp /path/to/your/documents/*.pdf /path/to/your/pdfs/
```
### Step 4: Process Your PDFs
```bash
# Convert PDFs to searchable format
python manage_server.py process-pdfs
```
This will:
- Convert each PDF to markdown format
- Create a search index for fast queries
- Cache the results for quick startup
### Step 5: Test Your Configuration
```bash
# Verify everything is working
python manage_server.py test
```
### Step 6: Generate MCP Configuration
```bash
# Generate configuration for Claude Desktop
python generate_mcp_config.py --merge
```
This will automatically update your Claude Desktop configuration.
### Step 7: Restart Claude Desktop
Close and reopen Claude Desktop to load your new server.
### Step 8: Test with Claude
Ask Claude something like:
- "Can you list the available documents in my server?"
- "Search for information about [your topic]"
- "What does the documentation say about [specific concept]?"
## π¨ Customization Examples
### Legal Firm Setup
```bash
# Create config for legal documents
python manage_server.py create-config
# When prompted, use:
# Server name: legal-docs-server
# Display name: Legal Documents Server
# Keywords: contract, liability, jurisdiction, compliance
# PDF folder: ./legal-docs
```
### Technical Team Setup
```bash
# Use the technical documentation example
cp examples/tech_docs_config.json server_config.json
# Edit server_config.json to update:
# - pdf_folder path
# - domain_keywords for your technology stack
# - tool names if desired
```
### Research Lab Setup
```bash
# Use the research papers example
cp examples/research_papers_config.json server_config.json
# Customize for your research domain:
# - Add field-specific keywords
# - Adjust context_size for longer excerpts
# - Set max_results_default higher for comprehensive searches
```
## π§ Configuration Options Explained
### Server Section
```json
{
"server": {
"name": "unique-server-name", // Used in MCP config, must be unique
"display_name": "Human Readable Name",
"description": "What this server does",
"version": "1.0.0"
}
}
```
### Storage Section
```json
{
"storage": {
"pdf_folder": "./docs", // Where your PDFs are stored
"markdown_folder": "./docs/markdown", // Where processed files go
"domain_keywords": [ // Important terms for your domain
"keyword1", "keyword2"
]
}
}
```
### Tools Section
```json
{
"tools": {
"search": {
"name": "search_docs", // MCP tool name
"description": "Search functionality"
},
"list": {
"name": "list_docs", // MCP tool name
"description": "List functionality"
},
"content": {
"name": "get_content", // MCP tool name
"description": "Content retrieval"
},
"max_results_default": 5 // Default number of search results
}
}
```
### Processing Section
```json
{
"processing": {
"cache_enabled": true, // Enable caching for performance
"parallel_processing": true, // Process multiple PDFs at once
"max_file_size_mb": 50, // Skip files larger than this
"context_size": 500 // Characters around search matches
}
}
```
## π― Domain-Specific Keywords
Choose keywords that are important in your field:
**Legal**: contract, liability, jurisdiction, statute, regulation, precedent, compliance, clause, provision, warranty, indemnity, arbitration, damages, breach
**Technical**: API, function, method, class, parameter, return, algorithm, database, authentication, configuration, deployment, testing, framework, library
**Medical**: diagnosis, treatment, symptom, medication, therapy, clinical, protocol, pathology, pharmaceutical, contraindication, prognosis
**Research**: hypothesis, methodology, experiment, analysis, results, literature, statistical, correlation, sample, systematic, peer-review
**Financial**: investment, portfolio, risk, return, asset, liability, equity, dividend, yield, valuation, compliance, regulation
## π Search Tips
Once your server is running, you can:
**Ask broad questions:**
- "What topics are covered in these documents?"
- "Search for information about risk management"
**Get specific information:**
- "Find all references to API authentication"
- "What does the documentation say about error handling?"
**Retrieve full content:**
- "Show me the complete content of the installation guide"
- "Get page 5 of the user manual"
## π Troubleshooting
### Common Issues
**"Configuration file not found"**
```bash
# Make sure you're in the right directory
ls server_config.json
# Or create a new config
python manage_server.py create-config
```
**"No PDF files found"**
```bash
# Check your PDF folder path
python manage_server.py list-pdfs
# Add PDFs to the correct location
python manage_server.py add-pdf /path/to/document.pdf
```
**"Server not appearing in Claude"**
```bash
# Regenerate MCP config
python generate_mcp_config.py --merge
# Restart Claude Desktop completely
# Check Claude Desktop logs for errors
```
**"Search returns no results"**
```bash
# Make sure PDFs are processed
python manage_server.py process-pdfs
# Check if markdown files were created
ls /path/to/markdown/folder/
# Try broader search terms
```
### Debug Mode
```bash
# Run server with detailed logging
python server.py 2>&1 | tee server_debug.log
# Check configuration syntax
python -c "from config import load_config_from_env_or_file; print('Config OK')"
# Validate configuration
python manage_server.py test
```
## π Advanced Configurations
### Multiple Servers
You can run multiple specialized servers:
```bash
# Legal documents
python manage_server.py --config legal_config.json create-config
# Technical docs
python manage_server.py --config tech_config.json create-config
# Each gets its own MCP entry
python generate_mcp_config.py --config legal_config.json --merge
python generate_mcp_config.py --config tech_config.json --merge
```
### Large Document Collections
For collections with 100+ PDFs:
```json
{
"processing": {
"parallel_processing": true, // Enable for faster processing
"max_file_size_mb": 100, // Increase if you have large files
"context_size": 300 // Reduce for faster search
},
"tools": {
"max_results_default": 10 // Show more results
}
}
```
### Performance Tuning
For better performance:
1. **Use SSD storage** for PDF and markdown folders
2. **Increase context_size** for more detailed results
3. **Add more domain keywords** for better relevance
4. **Enable parallel_processing** for faster PDF conversion
5. **Use cache_enabled: true** for faster restarts
## π Getting Help
If you encounter issues:
1. Check the troubleshooting section above
2. Run `python manage_server.py test` to validate your setup
3. Look at the debug logs: `python server.py 2>&1 | tee debug.log`
4. Verify your PDF files are readable and not corrupted
5. Make sure Claude Desktop is using the correct configuration file
## π Success!
Once everything is working, you should be able to:
β
Ask Claude to search through your documents
β
Get relevant excerpts with highlighted matches
β
Retrieve full document content
β
List all available documents
β
Get intelligent, context-aware responses
Your PDF collection is now fully searchable through natural language queries in Claude Desktop!