Crawl4AI MCP Server

TROUBLESHOOTING.md•11.1 kB

# Troubleshooting Guide: Crawl4AI MCP Server This guide helps resolve common issues when setting up and using the Crawl4AI MCP Server. ## Quick Diagnostics ### 1. Basic Server Test ```bash cd /path/to/crawl4ai-mcp source venv/bin/activate python3 crawl4ai_mcp_server.py ``` **Expected Result**: FastMCP banner appears, server waits for input **If Failed**: See [Server Won't Start](#server-wont-start) ### 2. Dependencies Check ```bash source venv/bin/activate python3 -c "import fastmcp, crawl4ai; print('Dependencies OK')" ``` **Expected Result**: "Dependencies OK" **If Failed**: See [Import Errors](#import-errors) ### 3. MCP Inspector Test ```bash source venv/bin/activate fastmcp dev crawl4ai_mcp_server.py ``` **Expected Result**: Web interface opens at localhost:6274 **If Failed**: See [MCP Inspector Issues](#mcp-inspector-issues) ## Common Issues and Solutions ### Server Won't Start #### Symptom: "No such file or directory" ```bash python3: can't open file 'crawl4ai_mcp_server.py': [Errno 2] No such file or directory ``` **Solution**: ```bash # Verify you're in the correct directory pwd ls -la crawl4ai_mcp_server.py # If missing, check if you're in the right location cd /correct/path/to/crawl4ai-mcp ``` #### Symptom: "Permission denied" ```bash python3: can't open file 'crawl4ai_mcp_server.py': [Errno 13] Permission denied ``` **Solution**: ```bash # Fix file permissions chmod +r crawl4ai_mcp_server.py # Or make executable chmod +x crawl4ai_mcp_server.py ``` #### Symptom: Server starts but crashes immediately **Check for error messages in output** **Common causes**: 1. **Virtual environment not activated** ```bash source venv/bin/activate # Must run before python3 ``` 2. **Missing dependencies** ```bash pip install -r requirements.txt playwright install ``` 3. **Port already in use** (for MCP Inspector) ```bash # Kill processes using port 6274 lsof -ti:6274 | xargs kill -9 ``` ### Import Errors #### Symptom: "ModuleNotFoundError: No module named 'fastmcp'" ```bash ModuleNotFoundError: No module named 'fastmcp' ``` **Solution**: ```bash # Ensure virtual environment is activated source venv/bin/activate # Reinstall dependencies pip install -r requirements.txt # Verify installation pip list | grep fastmcp ``` #### Symptom: "ModuleNotFoundError: No module named 'crawl4ai'" **Solution**: ```bash # Install Crawl4AI with specific version pip install "crawl4ai>=0.3.74" # If still failing, try development install pip install --upgrade pip pip install -r requirements.txt --force-reinstall ``` #### Symptom: "ModuleNotFoundError: No module named 'playwright'" **Solution**: ```bash # Install Playwright and browsers pip install playwright playwright install # For M1/M2 Macs, you might need: playwright install --with-deps ``` ### Claude Desktop Integration Issues #### Symptom: "MCP server 'crawl4ai' not found" **Check configuration file location**: **macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json` **Windows**: `%APPDATA%\Claude\claude_desktop_config.json` **Linux**: `~/.config/Claude/claude_desktop_config.json` **Solution**: ```json { "mcpServers": { "crawl4ai": { "command": "/absolute/path/to/crawl4ai-mcp/venv/bin/python3", "args": ["/absolute/path/to/crawl4ai-mcp/crawl4ai_mcp_server.py"], "env": {} } } } ``` **Key Points**: - Use **absolute paths** (starting with `/` or `C:\`) - Use the Python from your virtual environment - Restart Claude Desktop after changes #### Symptom: "Server exited with code 1" **Debug by running server manually**: ```bash cd /path/to/crawl4ai-mcp source venv/bin/activate python3 crawl4ai_mcp_server.py # Look for error messages ``` **Common fixes**: 1. **Wrong Python path**: ```bash which python3 # Use this path in config ``` 2. **Missing virtual environment activation**: ```json { "command": "/path/to/venv/bin/python3" } ``` 3. **Incorrect working directory**: ```json { "env": { "PYTHONPATH": "/path/to/crawl4ai-mcp" } } ``` ### Tool Execution Errors #### Symptom: Tools timeout or hang **Check network connectivity**: ```bash # Test basic connectivity curl -I https://httpbin.org/html # Test with verbose output source venv/bin/activate python3 -c " import asyncio from crawl4ai import AsyncWebCrawler async def test(): async with AsyncWebCrawler(verbose=True) as crawler: result = await crawler.arun('https://httpbin.org/html') print('Success:', result.success) asyncio.run(test()) " ``` **Solutions**: 1. **Firewall blocking requests** - Check corporate firewall settings - Test from different network 2. **Timeout too short** - Modify server code to increase timeout - Test with simpler websites first #### Symptom: "Invalid URL format" errors **Example**: `ERROR: Invalid URL format: example.com` **Solution**: Always use complete URLs ```bash # Wrong example.com # Correct https://example.com ``` #### Symptom: Schema extraction returns empty results **Debug schema with simple test**: ```bash # Test CSS selectors manually curl -s https://example.com | grep -E "<h1|<p" ``` **Common schema issues**: 1. **CSS selectors don't match content** ```json // Wrong - selector doesn't exist {"title": ".nonexistent-class"} // Better - use common selectors {"title": "h1, .title, #title"} ``` 2. **JavaScript-rendered content** - Some sites need JavaScript execution - Try different websites for testing - Consider using more specific selectors ### MCP Inspector Issues #### Symptom: "Address already in use" ```bash Error: listen EADDRINUSE :::6274 ``` **Solution**: ```bash # Kill process using port lsof -ti:6274 | xargs kill -9 # Or use different port fastmcp dev crawl4ai_mcp_server.py --port 6275 ``` #### Symptom: Inspector loads but tools not visible **Check browser console for errors** **Solutions**: 1. **Clear browser cache** 2. **Try incognito/private browsing** 3. **Check authentication token** 4. **Restart server completely** ### Performance Issues #### Symptom: Slow crawling/timeouts **Optimization strategies**: 1. **Test with simpler sites first** ```bash # Good for testing https://httpbin.org/html https://example.com # Avoid for initial testing https://heavy-javascript-site.com ``` 2. **Monitor resource usage** ```bash # Check memory usage during crawling top -p $(pgrep -f crawl4ai_mcp_server) ``` 3. **Adjust crawler settings** (modify server code): ```python async with AsyncWebCrawler( verbose=False, max_timeout=60, # Increase timeout headless=True # Ensure headless mode ) as crawler: ``` #### Symptom: Memory usage keeps growing **Solutions**: 1. **Restart server periodically** 2. **Monitor for memory leaks** 3. **Process smaller batches of requests** ### Platform-Specific Issues #### macOS Issues **Symptom**: "Operation not permitted" for Playwright ```bash # Give Terminal full disk access in System Preferences > Privacy & Security # Or install with additional permissions playwright install --with-deps ``` #### Windows Issues **Symptom**: Path separator issues in configuration ```json { "mcpServers": { "crawl4ai": { "command": "C:\\path\\to\\venv\\Scripts\\python.exe", "args": ["C:\\path\\to\\crawl4ai_mcp_server.py"] } } } ``` #### Linux Issues **Symptom**: Missing system dependencies for Playwright ```bash # Install system dependencies sudo apt-get update sudo apt-get install -y \ libnss3 \ libatk-bridge2.0-0 \ libdrm2 \ libxkbcommon0 \ libxcomposite1 \ libxdamage1 \ libxrandr2 \ libgbm1 \ libgtk-3-0 # Then install Playwright browsers playwright install --with-deps ``` ## Advanced Debugging ### Enable Debug Logging **Modify server code**: ```python # In crawl4ai_mcp_server.py, change: logging.basicConfig( level=logging.DEBUG, # Change from INFO format='%(asctime)s - %(name)s - %(levelname)s - %(message)s', stream=sys.stderr, force=True ) ``` ### Test Individual Components **Test AsyncWebCrawler directly**: ```python import asyncio from crawl4ai import AsyncWebCrawler async def test_crawler(): async with AsyncWebCrawler(verbose=True) as crawler: result = await crawler.arun("https://httpbin.org/html") print(f"Success: {result.success}") print(f"Content length: {len(result.cleaned_html)}") asyncio.run(test_crawler()) ``` **Test FastMCP server without MCP Inspector**: ```bash echo '{"jsonrpc": "2.0", "id": 1, "method": "tools/list", "params": {}}' | python3 crawl4ai_mcp_server.py ``` ### Collect Diagnostic Information **When reporting issues, include**: 1. **System Information**: ```bash python3 --version pip list | grep -E "(fastmcp|crawl4ai|playwright)" uname -a # Linux/macOS ``` 2. **Error Messages**: - Full error output from server startup - Browser console errors (for MCP Inspector) - Claude Desktop logs 3. **Configuration**: - Your `claude_desktop_config.json` (remove sensitive paths) - Server startup command used - Network environment (corporate/home/etc.) ## Getting Help ### Self-Help Checklist - [ ] Virtual environment activated - [ ] All dependencies installed (`pip install -r requirements.txt`) - [ ] Playwright browsers installed (`playwright install`) - [ ] Server starts manually without errors - [ ] Correct absolute paths in configuration - [ ] Claude Desktop restarted after configuration changes - [ ] Tested with simple websites (example.com, httpbin.org) ### When to Seek Help If you've completed the checklist above and still have issues: 1. **Check existing documentation**: - README.md for setup instructions - INTEGRATION_GUIDE.md for configuration details - USAGE_EXAMPLES.md for tool usage patterns 2. **Gather diagnostic information** (see section above) 3. **Create minimal reproduction case**: - Use simplest possible configuration - Test with basic websites - Document exact steps that fail ### Common "Won't Fix" Issues - **JavaScript-heavy sites**: Some sites require complex JavaScript execution - **Rate limiting**: Websites may block rapid requests - **Authentication**: Sites requiring login are not supported - **Captchas**: Sites with captcha protection cannot be crawled - **Corporate firewalls**: May block web scraping entirely ## Maintenance ### Regular Updates ```bash # Update dependencies monthly source venv/bin/activate pip install --upgrade fastmcp crawl4ai playwright playwright install # Test after updates python3 crawl4ai_mcp_server.py ``` ### Log Rotation The server logs to stderr only. If using in production, consider log rotation: ```bash # Example systemd service with log rotation python3 crawl4ai_mcp_server.py 2>&1 | rotatelogs /var/log/crawl4ai-%Y%m%d.log 86400 ``` ### Backup Configuration Keep backups of: - `claude_desktop_config.json` - Any custom modifications to `crawl4ai_mcp_server.py` - Your virtual environment requirements: `pip freeze > requirements-backup.txt`

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Nexus-Digital-Automations/crawl4ai-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server