Skip to main content
Glama

Crawl4AI MCP Server

A powerful Model Context Protocol (MCP) server that provides web scraping and crawling capabilities using Crawl4AI. This server acts as the "hands and eyes" for client-side AI, enabling intelligent web content analysis and extraction.

Features

  • πŸ” Page Structure Analysis: Extract clean HTML or Markdown content from any webpage

  • 🎯 Schema-Based Extraction: Precision data extraction using CSS selectors and AI-generated schemas

  • πŸ“Έ Screenshot Capture: Visual webpage representation for analysis

  • ⚑ Async Operations: Non-blocking web crawling with progress reporting

  • πŸ›‘οΈ Error Handling: Comprehensive error handling and validation

  • πŸ“Š MCP Integration: Full Model Context Protocol compatibility with logging and progress tracking

Related MCP server: MCP Web Tools Server

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Client AI     β”‚    β”‚  Crawl4AI MCP     β”‚    β”‚   Web Content   β”‚
β”‚   ("Brain")     │◄──►│   Server          │◄──►│   (Websites)    β”‚
β”‚                 β”‚    β”‚  ("Hands & Eyes") β”‚    β”‚                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  • FastMCP: Handles MCP protocol and tool registration

  • AsyncWebCrawler: Provides async web scraping capabilities

  • Stdio Transport: MCP-compatible communication channel

  • Error-Safe Logging: All logs directed to stderr to prevent protocol corruption

Installation

Prerequisites

  • Python 3.10 or higher

  • pip package manager

Setup

  1. Clone or download this repository:

    git clone <repository-url>
    cd crawl4ai-mcp
  2. Create and activate virtual environment:

    python3 -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt
  4. Install Playwright browsers (required for screenshots):

    playwright install

Usage

Starting the Server

# Activate virtual environment
source venv/bin/activate

# Start the MCP server
python3 crawl4ai_mcp_server.py

Testing with MCP Inspector

For interactive testing and development:

# Start MCP Inspector interface
fastmcp dev crawl4ai_mcp_server.py

This will start a web interface (usually at http://localhost:6274) where you can test all tools interactively.

Available Tools

1. server_status

Purpose: Get server health and capabilities information
Parameters: None

Example Response:

{
  "server_name": "Crawl4AI-MCP-Server",
  "version": "1.0.0", 
  "status": "operational",
  "capabilities": ["web_crawling", "content_extraction", "screenshot_capture", "schema_based_extraction"]
}

2. get_page_structure

Purpose: Extract webpage content for analysis (the "eyes" function)
Parameters:

  • url (string): The webpage URL to analyze

  • format (string, optional): Output format - "html" or "markdown" (default: "html")

Example:

{
  "url": "https://example.com",
  "format": "html"
}

3. crawl_with_schema

Purpose: Precision data extraction using CSS selectors (the "hands" function)
Parameters:

  • url (string): The webpage URL to extract data from

  • extraction_schema (string): JSON string defining field names and CSS selectors

Example Schema:

{
  "title": "h1",
  "description": "p.description", 
  "price": ".price-value",
  "author": ".author-name",
  "tags": ".tag"
}

Example Usage:

{
  "url": "https://example.com/product",
  "extraction_schema": "{\"title\": \"h1\", \"price\": \".price\", \"description\": \"p\"}"
}

4. take_screenshot

Purpose: Capture visual representation of webpage
Parameters:

  • url (string): The webpage URL to screenshot

Example:

{
  "url": "https://example.com"
}

Returns: Base64-encoded PNG image data with metadata

Integration with Claude Desktop

To use this server with Claude Desktop, add this configuration to your Claude Desktop settings:

{
  "mcpServers": {
    "crawl4ai": {
      "command": "python3",
      "args": ["/path/to/crawl4ai-mcp/crawl4ai_mcp_server.py"],
      "env": {}
    }
  }
}

Replace /path/to/crawl4ai-mcp/ with the actual path to your installation directory.

Error Handling

All tools include comprehensive error handling and return structured JSON responses:

{
  "error": "Error description",
  "url": "https://example.com", 
  "success": false
}

Common error scenarios:

  • Invalid URL format

  • Network connectivity issues

  • Invalid extraction schemas

  • Screenshot capture failures

Development

Project Structure

crawl4ai-mcp/
β”œβ”€β”€ crawl4ai_mcp_server.py    # Main server implementation
β”œβ”€β”€ requirements.txt          # Python dependencies
β”œβ”€β”€ pyproject.toml           # Project configuration
β”œβ”€β”€ USAGE_EXAMPLES.md        # Detailed usage examples
└── README.md               # This file

Dependencies

  • fastmcp: FastMCP framework for MCP server development

  • crawl4ai: Core web crawling and extraction library

  • pydantic: Data validation and parsing

  • playwright: Browser automation for screenshots

Testing

Run the linter to ensure code quality:

ruff check .

Test server startup:

python3 crawl4ai_mcp_server.py

Contributing

  1. Fork the repository

  2. Create a feature branch

  3. Make your changes

  4. Test thoroughly with MCP Inspector

  5. Submit a pull request

License

This project is open source. See the LICENSE file for details.

Support

For issues and questions:

  1. Check the troubleshooting section in USAGE_EXAMPLES.md

  2. Test with MCP Inspector to isolate issues

  3. Verify all dependencies are correctly installed

  4. Ensure virtual environment is activated

Acknowledgments

  • Crawl4AI: Powerful web crawling and extraction capabilities

  • FastMCP: Streamlined MCP server development framework

  • Model Context Protocol: Standardized AI tool integration

-
security - not tested
F
license - not found
-
quality - not tested

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Nexus-Digital-Automations/crawl4ai-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server