Crawl4AI MCP Server

README.md•6.23 KiB

# Crawl4AI MCP Server A powerful Model Context Protocol (MCP) server that provides web scraping and crawling capabilities using Crawl4AI. This server acts as the "hands and eyes" for client-side AI, enabling intelligent web content analysis and extraction. ## Features - **🔍 Page Structure Analysis**: Extract clean HTML or Markdown content from any webpage - **🎯 Schema-Based Extraction**: Precision data extraction using CSS selectors and AI-generated schemas - **📸 Screenshot Capture**: Visual webpage representation for analysis - **⚡ Async Operations**: Non-blocking web crawling with progress reporting - **🛡️ Error Handling**: Comprehensive error handling and validation - **📊 MCP Integration**: Full Model Context Protocol compatibility with logging and progress tracking ## Architecture ``` ┌─────────────────┐ ┌───────────────────┐ ┌─────────────────┐ │ Client AI │ │ Crawl4AI MCP │ │ Web Content │ │ ("Brain") │◄──►│ Server │◄──►│ (Websites) │ │ │ │ ("Hands & Eyes") │ │ │ └─────────────────┘ └───────────────────┘ └─────────────────┘ ``` - **FastMCP**: Handles MCP protocol and tool registration - **AsyncWebCrawler**: Provides async web scraping capabilities - **Stdio Transport**: MCP-compatible communication channel - **Error-Safe Logging**: All logs directed to stderr to prevent protocol corruption ## Installation ### Prerequisites - Python 3.10 or higher - pip package manager ### Setup 1. **Clone or download this repository**: ```bash git clone <repository-url> cd crawl4ai-mcp ``` 2. **Create and activate virtual environment**: ```bash python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate ``` 3. **Install dependencies**: ```bash pip install -r requirements.txt ``` 4. **Install Playwright browsers** (required for screenshots): ```bash playwright install ``` ## Usage ### Starting the Server ```bash # Activate virtual environment source venv/bin/activate # Start the MCP server python3 crawl4ai_mcp_server.py ``` ### Testing with MCP Inspector For interactive testing and development: ```bash # Start MCP Inspector interface fastmcp dev crawl4ai_mcp_server.py ``` This will start a web interface (usually at http://localhost:6274) where you can test all tools interactively. ## Available Tools ### 1. server_status **Purpose**: Get server health and capabilities information **Parameters**: None **Example Response**: ```json { "server_name": "Crawl4AI-MCP-Server", "version": "1.0.0", "status": "operational", "capabilities": ["web_crawling", "content_extraction", "screenshot_capture", "schema_based_extraction"] } ``` ### 2. get_page_structure **Purpose**: Extract webpage content for analysis (the "eyes" function) **Parameters**: - `url` (string): The webpage URL to analyze - `format` (string, optional): Output format - "html" or "markdown" (default: "html") **Example**: ```json { "url": "https://example.com", "format": "html" } ``` ### 3. crawl_with_schema **Purpose**: Precision data extraction using CSS selectors (the "hands" function) **Parameters**: - `url` (string): The webpage URL to extract data from - `extraction_schema` (string): JSON string defining field names and CSS selectors **Example Schema**: ```json { "title": "h1", "description": "p.description", "price": ".price-value", "author": ".author-name", "tags": ".tag" } ``` **Example Usage**: ```json { "url": "https://example.com/product", "extraction_schema": "{\"title\": \"h1\", \"price\": \".price\", \"description\": \"p\"}" } ``` ### 4. take_screenshot **Purpose**: Capture visual representation of webpage **Parameters**: - `url` (string): The webpage URL to screenshot **Example**: ```json { "url": "https://example.com" } ``` **Returns**: Base64-encoded PNG image data with metadata ## Integration with Claude Desktop To use this server with Claude Desktop, add this configuration to your Claude Desktop settings: ```json { "mcpServers": { "crawl4ai": { "command": "python3", "args": ["/path/to/crawl4ai-mcp/crawl4ai_mcp_server.py"], "env": {} } } } ``` Replace `/path/to/crawl4ai-mcp/` with the actual path to your installation directory. ## Error Handling All tools include comprehensive error handling and return structured JSON responses: ```json { "error": "Error description", "url": "https://example.com", "success": false } ``` Common error scenarios: - Invalid URL format - Network connectivity issues - Invalid extraction schemas - Screenshot capture failures ## Development ### Project Structure ``` crawl4ai-mcp/ ├── crawl4ai_mcp_server.py # Main server implementation ├── requirements.txt # Python dependencies ├── pyproject.toml # Project configuration ├── USAGE_EXAMPLES.md # Detailed usage examples └── README.md # This file ``` ### Dependencies - **fastmcp**: FastMCP framework for MCP server development - **crawl4ai**: Core web crawling and extraction library - **pydantic**: Data validation and parsing - **playwright**: Browser automation for screenshots ### Testing Run the linter to ensure code quality: ```bash ruff check . ``` Test server startup: ```bash python3 crawl4ai_mcp_server.py ``` ## Contributing 1. Fork the repository 2. Create a feature branch 3. Make your changes 4. Test thoroughly with MCP Inspector 5. Submit a pull request ## License This project is open source. See the LICENSE file for details. ## Support For issues and questions: 1. Check the troubleshooting section in USAGE_EXAMPLES.md 2. Test with MCP Inspector to isolate issues 3. Verify all dependencies are correctly installed 4. Ensure virtual environment is activated ## Acknowledgments - **Crawl4AI**: Powerful web crawling and extraction capabilities - **FastMCP**: Streamlined MCP server development framework - **Model Context Protocol**: Standardized AI tool integration

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Nexus-Digital-Automations/crawl4ai-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

README.md•6.23 KiB