Crawl4AI MCP Server
A powerful Model Context Protocol (MCP) server that provides web scraping and crawling capabilities using Crawl4AI. This server acts as the "hands and eyes" for client-side AI, enabling intelligent web content analysis and extraction.
Features
- 🔍 Page Structure Analysis: Extract clean HTML or Markdown content from any webpage
- 🎯 Schema-Based Extraction: Precision data extraction using CSS selectors and AI-generated schemas
- 📸 Screenshot Capture: Visual webpage representation for analysis
- ⚡ Async Operations: Non-blocking web crawling with progress reporting
- 🛡️ Error Handling: Comprehensive error handling and validation
- 📊 MCP Integration: Full Model Context Protocol compatibility with logging and progress tracking
Architecture
- FastMCP: Handles MCP protocol and tool registration
- AsyncWebCrawler: Provides async web scraping capabilities
- Stdio Transport: MCP-compatible communication channel
- Error-Safe Logging: All logs directed to stderr to prevent protocol corruption
Installation
Prerequisites
- Python 3.10 or higher
- pip package manager
Setup
- Clone or download this repository:
- Create and activate virtual environment:
- Install dependencies:
- Install Playwright browsers (required for screenshots):
Usage
Starting the Server
Testing with MCP Inspector
For interactive testing and development:
This will start a web interface (usually at http://localhost:6274) where you can test all tools interactively.
Available Tools
1. server_status
Purpose: Get server health and capabilities information
Parameters: None
Example Response:
2. get_page_structure
Purpose: Extract webpage content for analysis (the "eyes" function)
Parameters:
url
(string): The webpage URL to analyzeformat
(string, optional): Output format - "html" or "markdown" (default: "html")
Example:
3. crawl_with_schema
Purpose: Precision data extraction using CSS selectors (the "hands" function)
Parameters:
url
(string): The webpage URL to extract data fromextraction_schema
(string): JSON string defining field names and CSS selectors
Example Schema:
Example Usage:
4. take_screenshot
Purpose: Capture visual representation of webpage
Parameters:
url
(string): The webpage URL to screenshot
Example:
Returns: Base64-encoded PNG image data with metadata
Integration with Claude Desktop
To use this server with Claude Desktop, add this configuration to your Claude Desktop settings:
Replace /path/to/crawl4ai-mcp/
with the actual path to your installation directory.
Error Handling
All tools include comprehensive error handling and return structured JSON responses:
Common error scenarios:
- Invalid URL format
- Network connectivity issues
- Invalid extraction schemas
- Screenshot capture failures
Development
Project Structure
Dependencies
- fastmcp: FastMCP framework for MCP server development
- crawl4ai: Core web crawling and extraction library
- pydantic: Data validation and parsing
- playwright: Browser automation for screenshots
Testing
Run the linter to ensure code quality:
Test server startup:
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly with MCP Inspector
- Submit a pull request
License
This project is open source. See the LICENSE file for details.
Support
For issues and questions:
- Check the troubleshooting section in USAGE_EXAMPLES.md
- Test with MCP Inspector to isolate issues
- Verify all dependencies are correctly installed
- Ensure virtual environment is activated
Acknowledgments
- Crawl4AI: Powerful web crawling and extraction capabilities
- FastMCP: Streamlined MCP server development framework
- Model Context Protocol: Standardized AI tool integration
This server cannot be installed
remote-capable server
The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.
A Model Context Protocol server that provides web scraping capabilities, enabling AI to extract and analyze web content through page structure analysis, schema-based extraction, and screenshot capture.
Related MCP Servers
- AsecurityAlicenseAqualityA production-ready Model Context Protocol server that enables language models to leverage AI-powered web scraping capabilities, offering tools for transforming webpages to markdown, extracting structured data, and executing AI-powered web searches.Last updated -334PythonMIT License
- AsecurityAlicenseAqualityA Model Context Protocol server enabling AI assistants to scrape web content with high accuracy and flexibility, supporting multiple scraping modes and content formatting options.Last updated -46742TypeScriptMIT License
- AsecurityFlicenseAqualityA Model Context Protocol server that intelligently fetches and processes web content, transforming websites and documentation into clean, structured markdown with nested URL crawling capabilities.Last updated -26753TypeScript
- -securityAlicense-qualityA Model Context Protocol server that enables web scraping, crawling, and content extraction capabilities through integration with Firecrawl.Last updated -19,244JavaScriptMIT License