TROUBLESHOOTING.mdā¢11.1 kB
# Troubleshooting Guide: Crawl4AI MCP Server
This guide helps resolve common issues when setting up and using the Crawl4AI MCP Server.
## Quick Diagnostics
### 1. Basic Server Test
```bash
cd /path/to/crawl4ai-mcp
source venv/bin/activate
python3 crawl4ai_mcp_server.py
```
**Expected Result**: FastMCP banner appears, server waits for input
**If Failed**: See [Server Won't Start](#server-wont-start)
### 2. Dependencies Check
```bash
source venv/bin/activate
python3 -c "import fastmcp, crawl4ai; print('Dependencies OK')"
```
**Expected Result**: "Dependencies OK"
**If Failed**: See [Import Errors](#import-errors)
### 3. MCP Inspector Test
```bash
source venv/bin/activate
fastmcp dev crawl4ai_mcp_server.py
```
**Expected Result**: Web interface opens at localhost:6274
**If Failed**: See [MCP Inspector Issues](#mcp-inspector-issues)
## Common Issues and Solutions
### Server Won't Start
#### Symptom: "No such file or directory"
```bash
python3: can't open file 'crawl4ai_mcp_server.py': [Errno 2] No such file or directory
```
**Solution**:
```bash
# Verify you're in the correct directory
pwd
ls -la crawl4ai_mcp_server.py
# If missing, check if you're in the right location
cd /correct/path/to/crawl4ai-mcp
```
#### Symptom: "Permission denied"
```bash
python3: can't open file 'crawl4ai_mcp_server.py': [Errno 13] Permission denied
```
**Solution**:
```bash
# Fix file permissions
chmod +r crawl4ai_mcp_server.py
# Or make executable
chmod +x crawl4ai_mcp_server.py
```
#### Symptom: Server starts but crashes immediately
**Check for error messages in output**
**Common causes**:
1. **Virtual environment not activated**
```bash
source venv/bin/activate # Must run before python3
```
2. **Missing dependencies**
```bash
pip install -r requirements.txt
playwright install
```
3. **Port already in use** (for MCP Inspector)
```bash
# Kill processes using port 6274
lsof -ti:6274 | xargs kill -9
```
### Import Errors
#### Symptom: "ModuleNotFoundError: No module named 'fastmcp'"
```bash
ModuleNotFoundError: No module named 'fastmcp'
```
**Solution**:
```bash
# Ensure virtual environment is activated
source venv/bin/activate
# Reinstall dependencies
pip install -r requirements.txt
# Verify installation
pip list | grep fastmcp
```
#### Symptom: "ModuleNotFoundError: No module named 'crawl4ai'"
**Solution**:
```bash
# Install Crawl4AI with specific version
pip install "crawl4ai>=0.3.74"
# If still failing, try development install
pip install --upgrade pip
pip install -r requirements.txt --force-reinstall
```
#### Symptom: "ModuleNotFoundError: No module named 'playwright'"
**Solution**:
```bash
# Install Playwright and browsers
pip install playwright
playwright install
# For M1/M2 Macs, you might need:
playwright install --with-deps
```
### Claude Desktop Integration Issues
#### Symptom: "MCP server 'crawl4ai' not found"
**Check configuration file location**:
**macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`
**Windows**: `%APPDATA%\Claude\claude_desktop_config.json`
**Linux**: `~/.config/Claude/claude_desktop_config.json`
**Solution**:
```json
{
"mcpServers": {
"crawl4ai": {
"command": "/absolute/path/to/crawl4ai-mcp/venv/bin/python3",
"args": ["/absolute/path/to/crawl4ai-mcp/crawl4ai_mcp_server.py"],
"env": {}
}
}
}
```
**Key Points**:
- Use **absolute paths** (starting with `/` or `C:\`)
- Use the Python from your virtual environment
- Restart Claude Desktop after changes
#### Symptom: "Server exited with code 1"
**Debug by running server manually**:
```bash
cd /path/to/crawl4ai-mcp
source venv/bin/activate
python3 crawl4ai_mcp_server.py
# Look for error messages
```
**Common fixes**:
1. **Wrong Python path**:
```bash
which python3 # Use this path in config
```
2. **Missing virtual environment activation**:
```json
{
"command": "/path/to/venv/bin/python3"
}
```
3. **Incorrect working directory**:
```json
{
"env": {
"PYTHONPATH": "/path/to/crawl4ai-mcp"
}
}
```
### Tool Execution Errors
#### Symptom: Tools timeout or hang
**Check network connectivity**:
```bash
# Test basic connectivity
curl -I https://httpbin.org/html
# Test with verbose output
source venv/bin/activate
python3 -c "
import asyncio
from crawl4ai import AsyncWebCrawler
async def test():
async with AsyncWebCrawler(verbose=True) as crawler:
result = await crawler.arun('https://httpbin.org/html')
print('Success:', result.success)
asyncio.run(test())
"
```
**Solutions**:
1. **Firewall blocking requests**
- Check corporate firewall settings
- Test from different network
2. **Timeout too short**
- Modify server code to increase timeout
- Test with simpler websites first
#### Symptom: "Invalid URL format" errors
**Example**: `ERROR: Invalid URL format: example.com`
**Solution**: Always use complete URLs
```bash
# Wrong
example.com
# Correct
https://example.com
```
#### Symptom: Schema extraction returns empty results
**Debug schema with simple test**:
```bash
# Test CSS selectors manually
curl -s https://example.com | grep -E "<h1|<p"
```
**Common schema issues**:
1. **CSS selectors don't match content**
```json
// Wrong - selector doesn't exist
{"title": ".nonexistent-class"}
// Better - use common selectors
{"title": "h1, .title, #title"}
```
2. **JavaScript-rendered content**
- Some sites need JavaScript execution
- Try different websites for testing
- Consider using more specific selectors
### MCP Inspector Issues
#### Symptom: "Address already in use"
```bash
Error: listen EADDRINUSE :::6274
```
**Solution**:
```bash
# Kill process using port
lsof -ti:6274 | xargs kill -9
# Or use different port
fastmcp dev crawl4ai_mcp_server.py --port 6275
```
#### Symptom: Inspector loads but tools not visible
**Check browser console for errors**
**Solutions**:
1. **Clear browser cache**
2. **Try incognito/private browsing**
3. **Check authentication token**
4. **Restart server completely**
### Performance Issues
#### Symptom: Slow crawling/timeouts
**Optimization strategies**:
1. **Test with simpler sites first**
```bash
# Good for testing
https://httpbin.org/html
https://example.com
# Avoid for initial testing
https://heavy-javascript-site.com
```
2. **Monitor resource usage**
```bash
# Check memory usage during crawling
top -p $(pgrep -f crawl4ai_mcp_server)
```
3. **Adjust crawler settings** (modify server code):
```python
async with AsyncWebCrawler(
verbose=False,
max_timeout=60, # Increase timeout
headless=True # Ensure headless mode
) as crawler:
```
#### Symptom: Memory usage keeps growing
**Solutions**:
1. **Restart server periodically**
2. **Monitor for memory leaks**
3. **Process smaller batches of requests**
### Platform-Specific Issues
#### macOS Issues
**Symptom**: "Operation not permitted" for Playwright
```bash
# Give Terminal full disk access in System Preferences > Privacy & Security
# Or install with additional permissions
playwright install --with-deps
```
#### Windows Issues
**Symptom**: Path separator issues in configuration
```json
{
"mcpServers": {
"crawl4ai": {
"command": "C:\\path\\to\\venv\\Scripts\\python.exe",
"args": ["C:\\path\\to\\crawl4ai_mcp_server.py"]
}
}
}
```
#### Linux Issues
**Symptom**: Missing system dependencies for Playwright
```bash
# Install system dependencies
sudo apt-get update
sudo apt-get install -y \
libnss3 \
libatk-bridge2.0-0 \
libdrm2 \
libxkbcommon0 \
libxcomposite1 \
libxdamage1 \
libxrandr2 \
libgbm1 \
libgtk-3-0
# Then install Playwright browsers
playwright install --with-deps
```
## Advanced Debugging
### Enable Debug Logging
**Modify server code**:
```python
# In crawl4ai_mcp_server.py, change:
logging.basicConfig(
level=logging.DEBUG, # Change from INFO
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
stream=sys.stderr,
force=True
)
```
### Test Individual Components
**Test AsyncWebCrawler directly**:
```python
import asyncio
from crawl4ai import AsyncWebCrawler
async def test_crawler():
async with AsyncWebCrawler(verbose=True) as crawler:
result = await crawler.arun("https://httpbin.org/html")
print(f"Success: {result.success}")
print(f"Content length: {len(result.cleaned_html)}")
asyncio.run(test_crawler())
```
**Test FastMCP server without MCP Inspector**:
```bash
echo '{"jsonrpc": "2.0", "id": 1, "method": "tools/list", "params": {}}' | python3 crawl4ai_mcp_server.py
```
### Collect Diagnostic Information
**When reporting issues, include**:
1. **System Information**:
```bash
python3 --version
pip list | grep -E "(fastmcp|crawl4ai|playwright)"
uname -a # Linux/macOS
```
2. **Error Messages**:
- Full error output from server startup
- Browser console errors (for MCP Inspector)
- Claude Desktop logs
3. **Configuration**:
- Your `claude_desktop_config.json` (remove sensitive paths)
- Server startup command used
- Network environment (corporate/home/etc.)
## Getting Help
### Self-Help Checklist
- [ ] Virtual environment activated
- [ ] All dependencies installed (`pip install -r requirements.txt`)
- [ ] Playwright browsers installed (`playwright install`)
- [ ] Server starts manually without errors
- [ ] Correct absolute paths in configuration
- [ ] Claude Desktop restarted after configuration changes
- [ ] Tested with simple websites (example.com, httpbin.org)
### When to Seek Help
If you've completed the checklist above and still have issues:
1. **Check existing documentation**:
- README.md for setup instructions
- INTEGRATION_GUIDE.md for configuration details
- USAGE_EXAMPLES.md for tool usage patterns
2. **Gather diagnostic information** (see section above)
3. **Create minimal reproduction case**:
- Use simplest possible configuration
- Test with basic websites
- Document exact steps that fail
### Common "Won't Fix" Issues
- **JavaScript-heavy sites**: Some sites require complex JavaScript execution
- **Rate limiting**: Websites may block rapid requests
- **Authentication**: Sites requiring login are not supported
- **Captchas**: Sites with captcha protection cannot be crawled
- **Corporate firewalls**: May block web scraping entirely
## Maintenance
### Regular Updates
```bash
# Update dependencies monthly
source venv/bin/activate
pip install --upgrade fastmcp crawl4ai playwright
playwright install
# Test after updates
python3 crawl4ai_mcp_server.py
```
### Log Rotation
The server logs to stderr only. If using in production, consider log rotation:
```bash
# Example systemd service with log rotation
python3 crawl4ai_mcp_server.py 2>&1 | rotatelogs /var/log/crawl4ai-%Y%m%d.log 86400
```
### Backup Configuration
Keep backups of:
- `claude_desktop_config.json`
- Any custom modifications to `crawl4ai_mcp_server.py`
- Your virtual environment requirements: `pip freeze > requirements-backup.txt`