Crawl4AI MCP Server

INTEGRATION_GUIDE.md•7.47 kB

# Integration Guide: Crawl4AI MCP Server This guide provides detailed instructions for integrating the Crawl4AI MCP Server with various AI clients and development environments. ## Claude Desktop Integration ### Step 1: Locate Claude Desktop Configuration The Claude Desktop configuration file is located at: **macOS:** ``` ~/Library/Application Support/Claude/claude_desktop_config.json ``` **Windows:** ``` %APPDATA%\Claude\claude_desktop_config.json ``` **Linux:** ``` ~/.config/Claude/claude_desktop_config.json ``` ### Step 2: Add MCP Server Configuration Add the following configuration to your `claude_desktop_config.json` file: ```json { "mcpServers": { "crawl4ai": { "command": "python3", "args": ["/absolute/path/to/crawl4ai-mcp/crawl4ai_mcp_server.py"], "env": { "PYTHONPATH": "/absolute/path/to/crawl4ai-mcp" } } } } ``` **Important:** Replace `/absolute/path/to/crawl4ai-mcp/` with the actual full path to your installation directory. ### Step 3: Virtual Environment Setup If you're using a virtual environment (recommended), use this configuration instead: ```json { "mcpServers": { "crawl4ai": { "command": "/absolute/path/to/crawl4ai-mcp/venv/bin/python3", "args": ["/absolute/path/to/crawl4ai-mcp/crawl4ai_mcp_server.py"], "env": {} } } } ``` ### Step 4: Restart Claude Desktop After updating the configuration: 1. Completely quit Claude Desktop 2. Restart the application 3. The Crawl4AI MCP server should be automatically available ## Verification ### Check Server Connection Ask Claude Desktop to use the server: ``` Can you check the status of the Crawl4AI server? ``` Claude should be able to call the `server_status` tool and return server information. ### Test Basic Functionality Try extracting content from a webpage: ``` Can you get the page structure from https://example.com? ``` ### Test Schema Extraction Try precision data extraction: ``` Can you extract the title and description from https://news.ycombinator.com using appropriate CSS selectors? ``` ## Alternative Integration Methods ### Direct MCP Client Integration For custom MCP clients, connect to the server using stdio transport: ```python import subprocess import json # Start the MCP server process = subprocess.Popen( ["python3", "/path/to/crawl4ai_mcp_server.py"], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True ) # Send MCP protocol messages request = { "jsonrpc": "2.0", "id": 1, "method": "tools/list", "params": {} } process.stdin.write(json.dumps(request) + "\n") process.stdin.flush() # Read response response = process.stdout.readline() print(json.loads(response)) ``` ### FastMCP Development Server For development and testing: ```bash # Start interactive development server cd /path/to/crawl4ai-mcp source venv/bin/activate fastmcp dev crawl4ai_mcp_server.py ``` This provides a web interface for testing tools interactively. ## Configuration Options ### Environment Variables You can customize server behavior using environment variables: ```json { "mcpServers": { "crawl4ai": { "command": "python3", "args": ["/path/to/crawl4ai_mcp_server.py"], "env": { "CRAWL4AI_VERBOSE": "false", "CRAWL4AI_TIMEOUT": "30", "PYTHONPATH": "/path/to/crawl4ai-mcp" } } } } ``` ### Custom Server Settings For advanced users, you can modify server behavior by editing `crawl4ai_mcp_server.py`: ```python # Example: Increase crawler timeout async with AsyncWebCrawler( verbose=False, max_timeout=60 # Increase from default 30 seconds ) as crawler: # ... crawler operations ``` ## Troubleshooting Integration Issues ### Common Problems #### 1. Server Not Found **Error:** "MCP server 'crawl4ai' not found" **Solutions:** - Verify the path in `claude_desktop_config.json` is correct and absolute - Ensure the file `crawl4ai_mcp_server.py` exists at the specified path - Check file permissions (must be readable and executable) #### 2. Python Environment Issues **Error:** "ModuleNotFoundError" or import errors **Solutions:** - Use the full path to Python in your virtual environment - Verify all dependencies are installed: `pip install -r requirements.txt` - Install Playwright browsers: `playwright install` #### 3. Permission Denied **Error:** "Permission denied" when starting server **Solutions:** ```bash # Make the server file executable chmod +x /path/to/crawl4ai_mcp_server.py # Or use explicit python command "command": "python3" # instead of direct file execution ``` #### 4. Server Crashes on Startup **Check server logs by running manually:** ```bash cd /path/to/crawl4ai-mcp source venv/bin/activate python3 crawl4ai_mcp_server.py ``` Look for error messages in the output. #### 5. Tools Not Available **Symptoms:** Claude Desktop can't see the crawling tools **Solutions:** - Restart Claude Desktop completely - Check server status with: "Can you call the server_status tool?" - Verify MCP protocol communication is working ### Debug Mode Enable verbose logging by modifying the server: ```python # In crawl4ai_mcp_server.py logging.basicConfig( level=logging.DEBUG, # Change from INFO to DEBUG format='%(asctime)s - %(name)s - %(levelname)s - %(message)s', stream=sys.stderr, force=True ) ``` ### Testing Configuration Test your configuration before using with Claude Desktop: ```bash # Test server startup cd /path/to/crawl4ai-mcp source venv/bin/activate python3 crawl4ai_mcp_server.py # Should show FastMCP banner and wait for input # Press Ctrl+C to exit ``` ## Integration Best Practices ### 1. Path Management - Always use absolute paths in configuration - Avoid spaces in directory names - Use forward slashes even on Windows for JSON configuration ### 2. Virtual Environment - Always use a dedicated virtual environment - Keep the environment activated when testing - Document the Python version used ### 3. Error Handling - Monitor Claude Desktop logs for integration issues - Test tools individually before complex workflows - Keep server logs for troubleshooting ### 4. Performance - Be mindful of web scraping rate limits - Test with small pages before large ones - Monitor memory usage during heavy crawling ## Advanced Integration ### Custom Tool Extensions You can extend the server with additional tools by modifying `crawl4ai_mcp_server.py`: ```python @mcp.tool() async def custom_analysis_tool( url: Annotated[str, Field(description="The URL to analyze")], ctx: Context = None ) -> str: """Custom analysis tool for specific use cases.""" # Your custom implementation pass ``` ### Integration with Other Services The server can be extended to integrate with other services: ```python # Example: Add database storage @mcp.tool() async def store_crawl_result( url: str, content: str, ctx: Context = None ) -> str: """Store crawl results in database.""" # Database integration code pass ``` ## Support and Maintenance ### Regular Updates - Keep Crawl4AI dependencies updated - Monitor FastMCP for updates - Test integration after Claude Desktop updates ### Backup Configuration - Keep a backup of your `claude_desktop_config.json` - Document any custom modifications - Version control your server extensions For additional support, refer to the main README.md file and the troubleshooting section.

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Nexus-Digital-Automations/crawl4ai-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server