README.md•6.8 kB
# Crawl4AI MCP Wrapper
Custom Model Context Protocol (MCP) server that wraps the Crawl4AI Docker API with reliable stdio transport for Claude Code integration.
## Overview
This MCP wrapper provides 7 tools for web scraping, crawling, and content extraction:
1. **scrape_markdown** - Extract clean markdown from webpages
2. **extract_html** - Get preprocessed HTML structures
3. **capture_screenshot** - Take full-page PNG screenshots
4. **generate_pdf** - Create PDF documents from webpages
5. **execute_javascript** - Run JavaScript in browser context
6. **crawl_urls** - Crawl multiple URLs with configuration options
7. **ask_crawl4ai** - Query Crawl4AI documentation and examples
## Prerequisites
- Python 3.8 or higher
- Crawl4AI Docker container running on port 11235
- Claude Code installed
## Installation
### 1. Start Crawl4AI Docker Container
```bash
docker run -d -p 11235:11235 --name crawl4ai --shm-size=2g unclecode/crawl4ai:latest
```
Verify it's running:
```bash
curl http://localhost:11235/health
```
### 2. Set Up Virtual Environment
```bash
cd /Volumes/4TB/Users/josephmcmyne/myProjects/mcp/crawl4ai-wrapper
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```
### 3. Test the Wrapper
```bash
python test_wrapper.py
```
You should see all tests pass:
```
✅ All tests passed! MCP wrapper is ready to use.
```
## Configuration
### For Claude Code (Project-Level)
Add to `/Volumes/4TB/Users/josephmcmyne/general/.mcp.json`:
```json
{
"mcpServers": {
"iterm-mcp": {
"args": ["-y", "iterm-mcp"],
"command": "npx"
},
"crawl4ai-custom": {
"command": "/Volumes/4TB/Users/josephmcmyne/myProjects/mcp/crawl4ai-wrapper/venv/bin/python",
"args": ["/Volumes/4TB/Users/josephmcmyne/myProjects/mcp/crawl4ai-wrapper/crawl4ai_mcp.py"],
"env": {
"CRAWL4AI_BASE_URL": "http://localhost:11235"
}
}
}
}
```
### For Claude Code (Global)
Add to `~/.claude/claude_desktop_config.json` (if this file exists):
```json
{
"mcpServers": {
"crawl4ai-custom": {
"command": "/Volumes/4TB/Users/josephmcmyne/myProjects/mcp/crawl4ai-wrapper/venv/bin/python",
"args": ["/Volumes/4TB/Users/josephmcmyne/myProjects/mcp/crawl4ai-wrapper/crawl4ai_mcp.py"],
"env": {
"CRAWL4AI_BASE_URL": "http://localhost:11235"
}
}
}
}
```
## Usage
### Restart Claude Code
After configuration, restart Claude Code to load the new MCP server.
### Verify Connection
```bash
claude mcp list
```
You should see:
```
crawl4ai-custom: ... - ✓ Connected
```
### Use in Claude Code
Example prompts:
```
Scrape the content from https://example.com and summarize it.
```
```
Take a screenshot of https://github.com
```
```
Crawl these URLs and extract their main content: https://example.com, https://example.org
```
## Tool Reference
### scrape_markdown
Extract clean markdown from a webpage.
**Parameters:**
- `url` (required): URL to scrape
- `filter_type` (optional): 'fit', 'raw', 'bm25', or 'llm' (default: 'fit')
- `query` (optional): Query string for BM25/LLM filters
- `cache_bust` (optional): Cache-bust counter (default: '0')
**Example:**
```python
result = await scrape_markdown("https://example.com")
```
### extract_html
Get preprocessed HTML from a webpage.
**Parameters:**
- `url` (required): URL to extract HTML from
### capture_screenshot
Capture a full-page PNG screenshot.
**Parameters:**
- `url` (required): URL to screenshot
- `screenshot_wait_for` (optional): Seconds to wait before capture (default: 2.0)
- `output_path` (optional): Path to save the screenshot file
### generate_pdf
Generate a PDF document from a webpage.
**Parameters:**
- `url` (required): URL to convert to PDF
- `output_path` (optional): Path to save the PDF file
### execute_javascript
Run JavaScript code on a webpage.
**Parameters:**
- `url` (required): URL to execute scripts on
- `scripts` (required): List of JavaScript snippets to execute
### crawl_urls
Crawl multiple URLs with configuration.
**Parameters:**
- `urls` (required): List of 1-100 URLs to crawl
- `browser_config` (optional): Browser configuration overrides
- `crawler_config` (optional): Crawler configuration overrides
- `hooks` (optional): Custom hook functions
### ask_crawl4ai
Query Crawl4AI documentation.
**Parameters:**
- `query` (required): Search query
- `context_type` (optional): 'code', 'doc', or 'all' (default: 'all')
- `score_ratio` (optional): Minimum score threshold (default: 0.5)
- `max_results` (optional): Maximum results (default: 20)
## Troubleshooting
### Container Not Running
If tests fail with connection errors:
```bash
docker ps | grep crawl4ai
```
If not running, start it:
```bash
docker start crawl4ai
```
### MCP Server Not Connecting
Check Claude Code logs:
```bash
tail -f ~/Library/Logs/Claude/mcp*.log
```
Verify the Python path in your `.mcp.json` is correct:
```bash
which python
# Should match the venv path in your config
```
### Port Conflicts
If port 11235 is in use:
1. Stop the Crawl4AI container: `docker stop crawl4ai`
2. Find the conflicting process: `lsof -i :11235`
3. Either stop that process or change the port in both Docker and this wrapper
## Development
### Adding New Tools
To add a new tool:
1. Add an async function decorated with `@mcp.tool()` in `crawl4ai_mcp.py`
2. Make an HTTP request to the appropriate Crawl4AI endpoint
3. Add error handling
4. Add a test in `test_wrapper.py`
5. Update this README
### Running in Debug Mode
```bash
# Enable debug logging
export CRAWL4AI_MCP_LOG=DEBUG
# Run the server
python crawl4ai_mcp.py
```
## Architecture
```
┌─────────────────┐
│ Claude Code │
└────────┬────────┘
│ stdio (MCP protocol)
▼
┌─────────────────┐
│ FastMCP Server │ (this wrapper)
│ crawl4ai_mcp.py│
└────────┬────────┘
│ HTTP/REST API
▼
┌─────────────────┐
│ Crawl4AI Docker │
│ Container │
│ (port 11235) │
└─────────────────┘
```
## Benefits Over Official SSE Server
- **Reliable stdio transport**: No SSE connection issues
- **Full control**: Easy to extend and customize
- **Better error handling**: Graceful degradation
- **Simple debugging**: Standard Python stack traces
- **No protocol mismatches**: Direct HTTP to Crawl4AI API
## License
MIT License - feel free to modify and distribute.
## Credits
- Built with [FastMCP](https://github.com/jlowin/fastmcp)
- Wraps [Crawl4AI](https://github.com/unclecode/crawl4ai)
- For use with [Claude Code](https://claude.ai/code)