Skip to main content
Glama
juanqui
by juanqui
troubleshooting.md9.95 kB
# Troubleshooting Guide Common issues, solutions, and performance tuning for **pdfkb-mcp**. ## Quick Diagnostics ### Check Server Status ```bash # Test if server is installed and working uvx pdfkb-mcp --help # Test basic startup PDFKB_EMBEDDING_PROVIDER=local PDFKB_LOG_LEVEL=DEBUG uvx pdfkb-mcp # Test health endpoint (if running with web interface) curl http://localhost:8000/health ``` ### Check MCP Tools After connecting to your MCP client, you should see these tools: - ✅ `add_document` - Add PDFs to knowledge base - ✅ `search_documents` - Search across documents - ✅ `list_documents` - List all documents with metadata - ✅ `remove_document` - Remove documents from knowledge base - ✅ `rescan_documents` - Rescan documents directory ## Common Issues ### 🚨 Server Not Appearing in MCP Client **Symptoms**: pdfkb-mcp doesn't show up in Claude Desktop, VS Code, etc. **Solutions**: 1. **Check Configuration Syntax**: ```bash # Validate JSON configuration python -m json.tool ~/.config/Claude/claude_desktop_config.json ``` 2. **Restart MCP Client Completely**: - Claude Desktop: Quit and restart the application - VS Code: Reload window (`Cmd/Ctrl + Shift + P` → "Developer: Reload Window") 3. **Verify Paths are Absolute**: ```json { "env": { "PDFKB_KNOWLEDGEBASE_PATH": "/Users/yourname/Documents", "PDFKB_CACHE_DIR": "/Users/yourname/Documents/.cache" } } ``` 4. **Test Command Manually**: ```bash # Run the exact command from your config uvx pdfkb-mcp --help ``` ### 🚨 Memory Issues / Out of Memory Errors **Symptoms**: Process killed, "OOM" errors, system slowdown **Solutions**: 1. **Reduce Memory Usage**: ```bash # Smaller embedding batches PDFKB_EMBEDDING_BATCH_SIZE=25 # Smaller chunks PDFKB_CHUNK_SIZE=500 # Reduce parallel operations PDFKB_MAX_PARALLEL_PARSING=1 PDFKB_MAX_PARALLEL_EMBEDDING=1 ``` 2. **Use Memory-Efficient Models**: ```bash # GGUF quantized models use 50-70% less memory PDFKB_LOCAL_EMBEDDING_MODEL="Qwen/Qwen3-Embedding-0.6B-GGUF" PDFKB_GGUF_QUANTIZATION="Q6_K" ``` 3. **Optimize Parser Settings**: ```bash # Use fastest, most memory-efficient parser PDFKB_PDF_PARSER="pymupdf4llm" ``` ### 🚨 Processing Too Slow **Symptoms**: Documents take forever to process **Solutions**: 1. **Optimize for Speed**: ```bash # Fastest parser PDFKB_PDF_PARSER="pymupdf4llm" # Larger chunks process faster PDFKB_CHUNK_SIZE=1200 # Increase parallelism (if memory allows) PDFKB_MAX_PARALLEL_PARSING=2 PDFKB_BACKGROUND_QUEUE_WORKERS=4 ``` 2. **Use Faster Embedding Models**: ```bash # Default local provider (fast, private) PDFKB_EMBEDDING_PROVIDER="local" PDFKB_LOCAL_EMBEDDING_MODEL="Qwen/Qwen3-Embedding-0.6B" # Or use an OpenAI-compatible provider (e.g., DeepInfra) PDFKB_EMBEDDING_PROVIDER="openai" PDFKB_OPENAI_API_BASE="https://api.deepinfra.com/v1" PDFKB_EMBEDDING_MODEL="text-embedding-3-small" ``` 3. **Optimize Storage**: ```bash # Store cache on fast SSD PDFKB_CACHE_DIR="/path/to/fast/ssd/.cache" ``` ### 🚨 API Key / Authentication Errors **Symptoms**: "Invalid API key", "Authentication failed" **Solutions**: 1. **Verify API Key Format**: ```bash # OpenAI keys start with 'sk-proj-' or 'sk-' echo $PDFKB_OPENAI_API_KEY | head -c 10 # DeepInfra keys are typically shorter echo $PDFKB_OPENAI_API_KEY | wc -c ``` 2. **Test API Key**: ```bash # Test OpenAI curl -H "Authorization: Bearer $PDFKB_OPENAI_API_KEY" \ https://api.openai.com/v1/models # Test DeepInfra curl -H "Authorization: Bearer $PDFKB_OPENAI_API_KEY" \ https://api.deepinfra.com/v1/models ``` 3. **Check API Base URL**: ```bash # For DeepInfra, ensure correct base URL PDFKB_OPENAI_API_BASE="https://api.deepinfra.com/v1" ``` ### 🚨 Poor Search Results **Symptoms**: Irrelevant results, missing obvious matches **Solutions**: 1. **Enable Hybrid Search**: ```bash PDFKB_ENABLE_HYBRID_SEARCH=true PDFKB_HYBRID_VECTOR_WEIGHT=0.6 PDFKB_HYBRID_TEXT_WEIGHT=0.4 ``` 2. **Add Reranking**: ```bash PDFKB_ENABLE_RERANKER=true PDFKB_RERANKER_PROVIDER=deepinfra PDFKB_DEEPINFRA_API_KEY="your-key" ``` 3. **Use Better Chunking**: Note: Default chunker is `langchain`. Set `PDFKB_PDF_CHUNKER="semantic"` to use semantic chunking. ```bash # Semantic chunking for better context PDFKB_PDF_CHUNKER="semantic" PDFKB_SEMANTIC_CHUNKER_THRESHOLD_TYPE="percentile" PDFKB_SEMANTIC_CHUNKER_THRESHOLD_AMOUNT="95.0" ``` ### 🚨 Docker/Container Issues **Symptoms**: Container won't start, permission errors **Solutions**: 1. **Fix Volume Permissions**: ```bash # Fix ownership sudo chown -R 1001:1001 ./documents ./cache ./logs # Or use current user sudo chown -R $(id -u):$(id -g) ./documents ./cache ./logs ``` 2. **Check Port Conflicts**: ```bash # Check if port is in use lsof -i :8000 netstat -tulpn | grep :8000 # Use different port PDFKB_WEB_PORT=8001 podman-compose up -d ``` 3. **Container Logs**: ```bash # View container logs podman logs pdfkb-mcp --tail 50 docker logs pdfkb-mcp --tail 50 # Follow logs in real-time podman-compose logs -f docker compose logs -f ``` ## Performance Tuning ### Resource Optimization by System | System Type | RAM | Configuration | |-------------|-----|---------------| | **Low-end** (2-4GB) | Limited | `pymupdf4llm`, local embeddings, batch=25 | | **Mid-range** (4-8GB) | Moderate | `marker`, hybrid search, batch=50 | | **High-end** (8GB+) | Abundant | `docling`, reranking, larger batches | ### Low-Resource Systems ```bash # Minimal resource usage PDFKB_PDF_PARSER="pymupdf4llm" PDFKB_EMBEDDING_PROVIDER="local" PDFKB_LOCAL_EMBEDDING_MODEL="Qwen/Qwen3-Embedding-0.6B-GGUF" PDFKB_GGUF_QUANTIZATION="Q4_K_M" PDFKB_EMBEDDING_BATCH_SIZE=25 PDFKB_MAX_PARALLEL_PARSING=1 PDFKB_MAX_PARALLEL_EMBEDDING=1 PDFKB_BACKGROUND_QUEUE_WORKERS=1 PDFKB_CHUNK_SIZE=800 ``` ### High-Performance Systems ```bash # Maximum performance and quality PDFKB_PDF_PARSER="docling" PDFKB_EMBEDDING_PROVIDER="openai" PDFKB_OPENAI_API_BASE="https://api.deepinfra.com/v1" PDFKB_EMBEDDING_MODEL="Qwen/Qwen3-Embedding-8B" PDFKB_ENABLE_HYBRID_SEARCH=true PDFKB_ENABLE_RERANKER=true PDFKB_RERANKER_PROVIDER="deepinfra" PDFKB_EMBEDDING_BATCH_SIZE=100 PDFKB_MAX_PARALLEL_PARSING=4 PDFKB_MAX_PARALLEL_EMBEDDING=2 PDFKB_BACKGROUND_QUEUE_WORKERS=4 PDFKB_CHUNK_SIZE=1200 ``` ### GPU Optimization **Apple Silicon (M1/M2/M3)**: ```bash # Automatic MPS acceleration PDFKB_EMBEDDING_DEVICE="mps" PDFKB_LOCAL_EMBEDDING_MODEL="Qwen/Qwen3-Embedding-4B" ``` **NVIDIA GPU**: ```bash # Automatic CUDA acceleration PDFKB_EMBEDDING_DEVICE="cuda" PDFKB_PDF_PARSER="mineru" # GPU-accelerated parser ``` ## Monitoring and Debugging ### Enable Debug Logging ```bash # Debug all components PDFKB_LOG_LEVEL=DEBUG uvx pdfkb-mcp # Or in MCP client config "env": { "PDFKB_LOG_LEVEL": "DEBUG" } ``` ### Monitor Resource Usage ```bash # Container resource usage podman stats pdfkb-mcp docker stats pdfkb-mcp # System resource usage htop # Look for pdfkb-mcp or python processes ``` ### Check Cache Status ```bash # View cache directory ls -la .cache/ du -sh .cache/ # Clear cache if needed (will reprocess all documents) rm -rf .cache/ ``` ### Test Components Individually ```bash # Test embedding service PDFKB_EMBEDDING_PROVIDER=local python -c " from pdfkb.embeddings_factory import create_embedding_service service = create_embedding_service() result = service.embed(['test text']) print('Embedding shape:', result[0].shape) " ``` ## Configuration Validation ### Check Environment Variables ```bash # Print all PDFKB variables env | grep PDFKB | sort # Validate key settings python -c " import os print('Documents:', os.environ.get('PDFKB_KNOWLEDGEBASE_PATH')) print('Cache:', os.environ.get('PDFKB_CACHE_DIR')) print('Provider:', os.environ.get('PDFKB_EMBEDDING_PROVIDER')) print('Parser:', os.environ.get('PDFKB_PDF_PARSER')) " ``` ### Test Paths ```bash # Check document directory exists and is readable test -d "$PDFKB_KNOWLEDGEBASE_PATH" && echo "✅ Documents dir exists" test -r "$PDFKB_KNOWLEDGEBASE_PATH" && echo "✅ Documents dir readable" # Check cache directory is writable test -w "$PDFKB_CACHE_DIR" && echo "✅ Cache dir writable" ``` ## Getting Help ### Debug Information to Collect When reporting issues, include: 1. **System Information**: ```bash uname -a python --version uvx --version ``` 2. **Configuration**: ```bash env | grep PDFKB | sort ``` 3. **Error Logs**: ```bash PDFKB_LOG_LEVEL=DEBUG uvx pdfkb-mcp 2>&1 | head -50 ``` 4. **Resource Usage**: ```bash free -h df -h ``` ### Community Support - **GitHub Issues**: [Report bugs](https://github.com/juanqui/pdfkb-mcp/issues) - **Discussions**: [Ask questions](https://github.com/juanqui/pdfkb-mcp/discussions) - **Documentation**: [User Guide](index.md) ## Prevention Tips ### Regular Maintenance ```bash # Monitor cache size du -sh .cache/ # Clean old logs find logs/ -name "*.log" -mtime +30 -delete # Update to latest version uvx pdfkb-mcp --reinstall ``` ### Configuration Best Practices 1. **Always use absolute paths** in configuration 2. **Set appropriate resource limits** for your system 3. **Monitor memory usage** during processing 4. **Use version pinning** for production deployments 5. **Regular backups** of document cache ### Performance Monitoring ```bash # Simple monitoring script while true; do echo "$(date): Memory $(free -h | grep Mem | awk '{print $3}')" sleep 60 done ``` This covers most common issues you'll encounter. For specific problems not covered here, check the [Configuration Guide](configuration.md) or [Advanced Features](advanced.md) guides.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/juanqui/pdfkb-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server