Skip to main content
Glama

Crawl4AI MCP Server

A powerful Model Context Protocol (MCP) server that provides web scraping and crawling capabilities using Crawl4AI. This server acts as the "hands and eyes" for client-side AI, enabling intelligent web content analysis and extraction.

Features

  • šŸ” Page Structure Analysis: Extract clean HTML or Markdown content from any webpage

  • šŸŽÆ Schema-Based Extraction: Precision data extraction using CSS selectors and AI-generated schemas

  • šŸ“ø Screenshot Capture: Visual webpage representation for analysis

  • ⚔ Async Operations: Non-blocking web crawling with progress reporting

  • šŸ›”ļø Error Handling: Comprehensive error handling and validation

  • šŸ“Š MCP Integration: Full Model Context Protocol compatibility with logging and progress tracking

Architecture

ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā” ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā” ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā” │ Client AI │ │ Crawl4AI MCP │ │ Web Content │ │ ("Brain") │◄──►│ Server │◄──►│ (Websites) │ │ │ │ ("Hands & Eyes") │ │ │ ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜ ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜ ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜
  • FastMCP: Handles MCP protocol and tool registration

  • AsyncWebCrawler: Provides async web scraping capabilities

  • Stdio Transport: MCP-compatible communication channel

  • Error-Safe Logging: All logs directed to stderr to prevent protocol corruption

Installation

Prerequisites

  • Python 3.10 or higher

  • pip package manager

Setup

  1. Clone or download this repository:

    git clone <repository-url> cd crawl4ai-mcp
  2. Create and activate virtual environment:

    python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt
  4. Install Playwright browsers (required for screenshots):

    playwright install

Usage

Starting the Server

# Activate virtual environment source venv/bin/activate # Start the MCP server python3 crawl4ai_mcp_server.py

Testing with MCP Inspector

For interactive testing and development:

# Start MCP Inspector interface fastmcp dev crawl4ai_mcp_server.py

This will start a web interface (usually at http://localhost:6274) where you can test all tools interactively.

Available Tools

1. server_status

Purpose: Get server health and capabilities information
Parameters: None

Example Response:

{ "server_name": "Crawl4AI-MCP-Server", "version": "1.0.0", "status": "operational", "capabilities": ["web_crawling", "content_extraction", "screenshot_capture", "schema_based_extraction"] }

2. get_page_structure

Purpose: Extract webpage content for analysis (the "eyes" function)
Parameters:

  • url (string): The webpage URL to analyze

  • format (string, optional): Output format - "html" or "markdown" (default: "html")

Example:

{ "url": "https://example.com", "format": "html" }

3. crawl_with_schema

Purpose: Precision data extraction using CSS selectors (the "hands" function)
Parameters:

  • url (string): The webpage URL to extract data from

  • extraction_schema (string): JSON string defining field names and CSS selectors

Example Schema:

{ "title": "h1", "description": "p.description", "price": ".price-value", "author": ".author-name", "tags": ".tag" }

Example Usage:

{ "url": "https://example.com/product", "extraction_schema": "{\"title\": \"h1\", \"price\": \".price\", \"description\": \"p\"}" }

4. take_screenshot

Purpose: Capture visual representation of webpage
Parameters:

  • url (string): The webpage URL to screenshot

Example:

{ "url": "https://example.com" }

Returns: Base64-encoded PNG image data with metadata

Integration with Claude Desktop

To use this server with Claude Desktop, add this configuration to your Claude Desktop settings:

{ "mcpServers": { "crawl4ai": { "command": "python3", "args": ["/path/to/crawl4ai-mcp/crawl4ai_mcp_server.py"], "env": {} } } }

Replace /path/to/crawl4ai-mcp/ with the actual path to your installation directory.

Error Handling

All tools include comprehensive error handling and return structured JSON responses:

{ "error": "Error description", "url": "https://example.com", "success": false }

Common error scenarios:

  • Invalid URL format

  • Network connectivity issues

  • Invalid extraction schemas

  • Screenshot capture failures

Development

Project Structure

crawl4ai-mcp/ ā”œā”€ā”€ crawl4ai_mcp_server.py # Main server implementation ā”œā”€ā”€ requirements.txt # Python dependencies ā”œā”€ā”€ pyproject.toml # Project configuration ā”œā”€ā”€ USAGE_EXAMPLES.md # Detailed usage examples └── README.md # This file

Dependencies

  • fastmcp: FastMCP framework for MCP server development

  • crawl4ai: Core web crawling and extraction library

  • pydantic: Data validation and parsing

  • playwright: Browser automation for screenshots

Testing

Run the linter to ensure code quality:

ruff check .

Test server startup:

python3 crawl4ai_mcp_server.py

Contributing

  1. Fork the repository

  2. Create a feature branch

  3. Make your changes

  4. Test thoroughly with MCP Inspector

  5. Submit a pull request

License

This project is open source. See the LICENSE file for details.

Support

For issues and questions:

  1. Check the troubleshooting section in USAGE_EXAMPLES.md

  2. Test with MCP Inspector to isolate issues

  3. Verify all dependencies are correctly installed

  4. Ensure virtual environment is activated

Acknowledgments

  • Crawl4AI: Powerful web crawling and extraction capabilities

  • FastMCP: Streamlined MCP server development framework

  • Model Context Protocol: Standardized AI tool integration

-
security - not tested
F
license - not found
-
quality - not tested

Related MCP Servers

  • A
    security
    -
    license
    A
    quality
    A production-ready Model Context Protocol server that enables language models to leverage AI-powered web scraping capabilities, offering tools for transforming webpages to markdown, extracting structured data, and executing AI-powered web searches.
    Last updated -
    5
    43
    MIT License
    • Apple
  • A
    security
    A
    license
    A
    quality
    A Model Context Protocol server enabling AI assistants to scrape web content with high accuracy and flexibility, supporting multiple scraping modes and content formatting options.
    Last updated -
    4
    17
    2
    MIT License
    • Linux
    • Apple
  • A
    security
    F
    license
    A
    quality
    A Model Context Protocol server that intelligently fetches and processes web content, transforming websites and documentation into clean, structured markdown with nested URL crawling capabilities.
    Last updated -
    2
    20
    5
  • A
    security
    A
    license
    A
    quality
    A Model Context Protocol server that enables web scraping, crawling, and content extraction capabilities through integration with Firecrawl.
    Last updated -
    8
    43,255
    MIT License
    • Apple

View all related MCP servers

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Nexus-Digital-Automations/crawl4ai-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server