Crawl4AI MCP Server
A powerful Model Context Protocol (MCP) server that provides web scraping and crawling capabilities using Crawl4AI. This server acts as the "hands and eyes" for client-side AI, enabling intelligent web content analysis and extraction.
Features
š Page Structure Analysis: Extract clean HTML or Markdown content from any webpage
šÆ Schema-Based Extraction: Precision data extraction using CSS selectors and AI-generated schemas
šø Screenshot Capture: Visual webpage representation for analysis
ā” Async Operations: Non-blocking web crawling with progress reporting
š”ļø Error Handling: Comprehensive error handling and validation
š MCP Integration: Full Model Context Protocol compatibility with logging and progress tracking
Architecture
FastMCP: Handles MCP protocol and tool registration
AsyncWebCrawler: Provides async web scraping capabilities
Stdio Transport: MCP-compatible communication channel
Error-Safe Logging: All logs directed to stderr to prevent protocol corruption
Installation
Prerequisites
Python 3.10 or higher
pip package manager
Setup
Clone or download this repository:
git clone <repository-url> cd crawl4ai-mcpCreate and activate virtual environment:
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activateInstall dependencies:
pip install -r requirements.txtInstall Playwright browsers (required for screenshots):
playwright install
Usage
Starting the Server
Testing with MCP Inspector
For interactive testing and development:
This will start a web interface (usually at http://localhost:6274) where you can test all tools interactively.
Available Tools
1. server_status
Purpose: Get server health and capabilities information
Parameters: None
Example Response:
2. get_page_structure
Purpose: Extract webpage content for analysis (the "eyes" function)
Parameters:
url
(string): The webpage URL to analyzeformat
(string, optional): Output format - "html" or "markdown" (default: "html")
Example:
3. crawl_with_schema
Purpose: Precision data extraction using CSS selectors (the "hands" function)
Parameters:
url
(string): The webpage URL to extract data fromextraction_schema
(string): JSON string defining field names and CSS selectors
Example Schema:
Example Usage:
4. take_screenshot
Purpose: Capture visual representation of webpage
Parameters:
url
(string): The webpage URL to screenshot
Example:
Returns: Base64-encoded PNG image data with metadata
Integration with Claude Desktop
To use this server with Claude Desktop, add this configuration to your Claude Desktop settings:
Replace /path/to/crawl4ai-mcp/
with the actual path to your installation directory.
Error Handling
All tools include comprehensive error handling and return structured JSON responses:
Common error scenarios:
Invalid URL format
Network connectivity issues
Invalid extraction schemas
Screenshot capture failures
Development
Project Structure
Dependencies
fastmcp: FastMCP framework for MCP server development
crawl4ai: Core web crawling and extraction library
pydantic: Data validation and parsing
playwright: Browser automation for screenshots
Testing
Run the linter to ensure code quality:
Test server startup:
Contributing
Fork the repository
Create a feature branch
Make your changes
Test thoroughly with MCP Inspector
Submit a pull request
License
This project is open source. See the LICENSE file for details.
Support
For issues and questions:
Check the troubleshooting section in USAGE_EXAMPLES.md
Test with MCP Inspector to isolate issues
Verify all dependencies are correctly installed
Ensure virtual environment is activated
Acknowledgments
Crawl4AI: Powerful web crawling and extraction capabilities
FastMCP: Streamlined MCP server development framework
Model Context Protocol: Standardized AI tool integration
This server cannot be installed
remote-capable server
The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.
A Model Context Protocol server that provides web scraping capabilities, enabling AI to extract and analyze web content through page structure analysis, schema-based extraction, and screenshot capture.
Related MCP Servers
- AsecurityAlicenseAqualityA production-ready Model Context Protocol server that enables language models to leverage AI-powered web scraping capabilities, offering tools for transforming webpages to markdown, extracting structured data, and executing AI-powered web searches.Last updated -539MIT License
- AsecurityAlicenseAqualityA Model Context Protocol server enabling AI assistants to scrape web content with high accuracy and flexibility, supporting multiple scraping modes and content formatting options.Last updated -4292MIT License
- AsecurityFlicenseAqualityA Model Context Protocol server that intelligently fetches and processes web content, transforming websites and documentation into clean, structured markdown with nested URL crawling capabilities.Last updated -2255
- -securityAlicense-qualityA Model Context Protocol server that enables web scraping, crawling, and content extraction capabilities through integration with Firecrawl.Last updated -839,239MIT License