Crawl4AI MCP Wrapper

README.md•6.8 kB

# Crawl4AI MCP Wrapper Custom Model Context Protocol (MCP) server that wraps the Crawl4AI Docker API with reliable stdio transport for Claude Code integration. ## Overview This MCP wrapper provides 7 tools for web scraping, crawling, and content extraction: 1. **scrape_markdown** - Extract clean markdown from webpages 2. **extract_html** - Get preprocessed HTML structures 3. **capture_screenshot** - Take full-page PNG screenshots 4. **generate_pdf** - Create PDF documents from webpages 5. **execute_javascript** - Run JavaScript in browser context 6. **crawl_urls** - Crawl multiple URLs with configuration options 7. **ask_crawl4ai** - Query Crawl4AI documentation and examples ## Prerequisites - Python 3.8 or higher - Crawl4AI Docker container running on port 11235 - Claude Code installed ## Installation ### 1. Start Crawl4AI Docker Container ```bash docker run -d -p 11235:11235 --name crawl4ai --shm-size=2g unclecode/crawl4ai:latest ``` Verify it's running: ```bash curl http://localhost:11235/health ``` ### 2. Set Up Virtual Environment ```bash cd /Volumes/4TB/Users/josephmcmyne/myProjects/mcp/crawl4ai-wrapper python3 -m venv venv source venv/bin/activate pip install -r requirements.txt ``` ### 3. Test the Wrapper ```bash python test_wrapper.py ``` You should see all tests pass: ``` ✅ All tests passed! MCP wrapper is ready to use. ``` ## Configuration ### For Claude Code (Project-Level) Add to `/Volumes/4TB/Users/josephmcmyne/general/.mcp.json`: ```json { "mcpServers": { "iterm-mcp": { "args": ["-y", "iterm-mcp"], "command": "npx" }, "crawl4ai-custom": { "command": "/Volumes/4TB/Users/josephmcmyne/myProjects/mcp/crawl4ai-wrapper/venv/bin/python", "args": ["/Volumes/4TB/Users/josephmcmyne/myProjects/mcp/crawl4ai-wrapper/crawl4ai_mcp.py"], "env": { "CRAWL4AI_BASE_URL": "http://localhost:11235" } } } } ``` ### For Claude Code (Global) Add to `~/.claude/claude_desktop_config.json` (if this file exists): ```json { "mcpServers": { "crawl4ai-custom": { "command": "/Volumes/4TB/Users/josephmcmyne/myProjects/mcp/crawl4ai-wrapper/venv/bin/python", "args": ["/Volumes/4TB/Users/josephmcmyne/myProjects/mcp/crawl4ai-wrapper/crawl4ai_mcp.py"], "env": { "CRAWL4AI_BASE_URL": "http://localhost:11235" } } } } ``` ## Usage ### Restart Claude Code After configuration, restart Claude Code to load the new MCP server. ### Verify Connection ```bash claude mcp list ``` You should see: ``` crawl4ai-custom: ... - ✓ Connected ``` ### Use in Claude Code Example prompts: ``` Scrape the content from https://example.com and summarize it. ``` ``` Take a screenshot of https://github.com ``` ``` Crawl these URLs and extract their main content: https://example.com, https://example.org ``` ## Tool Reference ### scrape_markdown Extract clean markdown from a webpage. **Parameters:** - `url` (required): URL to scrape - `filter_type` (optional): 'fit', 'raw', 'bm25', or 'llm' (default: 'fit') - `query` (optional): Query string for BM25/LLM filters - `cache_bust` (optional): Cache-bust counter (default: '0') **Example:** ```python result = await scrape_markdown("https://example.com") ``` ### extract_html Get preprocessed HTML from a webpage. **Parameters:** - `url` (required): URL to extract HTML from ### capture_screenshot Capture a full-page PNG screenshot. **Parameters:** - `url` (required): URL to screenshot - `screenshot_wait_for` (optional): Seconds to wait before capture (default: 2.0) - `output_path` (optional): Path to save the screenshot file ### generate_pdf Generate a PDF document from a webpage. **Parameters:** - `url` (required): URL to convert to PDF - `output_path` (optional): Path to save the PDF file ### execute_javascript Run JavaScript code on a webpage. **Parameters:** - `url` (required): URL to execute scripts on - `scripts` (required): List of JavaScript snippets to execute ### crawl_urls Crawl multiple URLs with configuration. **Parameters:** - `urls` (required): List of 1-100 URLs to crawl - `browser_config` (optional): Browser configuration overrides - `crawler_config` (optional): Crawler configuration overrides - `hooks` (optional): Custom hook functions ### ask_crawl4ai Query Crawl4AI documentation. **Parameters:** - `query` (required): Search query - `context_type` (optional): 'code', 'doc', or 'all' (default: 'all') - `score_ratio` (optional): Minimum score threshold (default: 0.5) - `max_results` (optional): Maximum results (default: 20) ## Troubleshooting ### Container Not Running If tests fail with connection errors: ```bash docker ps | grep crawl4ai ``` If not running, start it: ```bash docker start crawl4ai ``` ### MCP Server Not Connecting Check Claude Code logs: ```bash tail -f ~/Library/Logs/Claude/mcp*.log ``` Verify the Python path in your `.mcp.json` is correct: ```bash which python # Should match the venv path in your config ``` ### Port Conflicts If port 11235 is in use: 1. Stop the Crawl4AI container: `docker stop crawl4ai` 2. Find the conflicting process: `lsof -i :11235` 3. Either stop that process or change the port in both Docker and this wrapper ## Development ### Adding New Tools To add a new tool: 1. Add an async function decorated with `@mcp.tool()` in `crawl4ai_mcp.py` 2. Make an HTTP request to the appropriate Crawl4AI endpoint 3. Add error handling 4. Add a test in `test_wrapper.py` 5. Update this README ### Running in Debug Mode ```bash # Enable debug logging export CRAWL4AI_MCP_LOG=DEBUG # Run the server python crawl4ai_mcp.py ``` ## Architecture ``` ┌─────────────────┐ │ Claude Code │ └────────┬────────┘ │ stdio (MCP protocol) ▼ ┌─────────────────┐ │ FastMCP Server │ (this wrapper) │ crawl4ai_mcp.py│ └────────┬────────┘ │ HTTP/REST API ▼ ┌─────────────────┐ │ Crawl4AI Docker │ │ Container │ │ (port 11235) │ └─────────────────┘ ``` ## Benefits Over Official SSE Server - **Reliable stdio transport**: No SSE connection issues - **Full control**: Easy to extend and customize - **Better error handling**: Graceful degradation - **Simple debugging**: Standard Python stack traces - **No protocol mismatches**: Direct HTTP to Crawl4AI API ## License MIT License - feel free to modify and distribute. ## Credits - Built with [FastMCP](https://github.com/jlowin/fastmcp) - Wraps [Crawl4AI](https://github.com/unclecode/crawl4ai) - For use with [Claude Code](https://claude.ai/code)

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/joedank/mcpcrawl4ai'

If you have feedback or need assistance with the MCP directory API, please join our Discord server