Skip to main content
Glama

Crawl4AI MCP Server

Crawl4AI MCP Server

A powerful Model Context Protocol (MCP) server that provides web scraping and crawling capabilities using Crawl4AI. This server acts as the "hands and eyes" for client-side AI, enabling intelligent web content analysis and extraction.

Features

  • 🔍 Page Structure Analysis: Extract clean HTML or Markdown content from any webpage
  • 🎯 Schema-Based Extraction: Precision data extraction using CSS selectors and AI-generated schemas
  • 📸 Screenshot Capture: Visual webpage representation for analysis
  • ⚡ Async Operations: Non-blocking web crawling with progress reporting
  • 🛡️ Error Handling: Comprehensive error handling and validation
  • 📊 MCP Integration: Full Model Context Protocol compatibility with logging and progress tracking

Architecture

┌─────────────────┐ ┌───────────────────┐ ┌─────────────────┐ │ Client AI │ │ Crawl4AI MCP │ │ Web Content │ │ ("Brain") │◄──►│ Server │◄──►│ (Websites) │ │ │ │ ("Hands & Eyes") │ │ │ └─────────────────┘ └───────────────────┘ └─────────────────┘
  • FastMCP: Handles MCP protocol and tool registration
  • AsyncWebCrawler: Provides async web scraping capabilities
  • Stdio Transport: MCP-compatible communication channel
  • Error-Safe Logging: All logs directed to stderr to prevent protocol corruption

Installation

Prerequisites

  • Python 3.10 or higher
  • pip package manager

Setup

  1. Clone or download this repository:
    git clone <repository-url> cd crawl4ai-mcp
  2. Create and activate virtual environment:
    python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
  3. Install dependencies:
    pip install -r requirements.txt
  4. Install Playwright browsers (required for screenshots):
    playwright install

Usage

Starting the Server

# Activate virtual environment source venv/bin/activate # Start the MCP server python3 crawl4ai_mcp_server.py

Testing with MCP Inspector

For interactive testing and development:

# Start MCP Inspector interface fastmcp dev crawl4ai_mcp_server.py

This will start a web interface (usually at http://localhost:6274) where you can test all tools interactively.

Available Tools

1. server_status

Purpose: Get server health and capabilities information
Parameters: None

Example Response:

{ "server_name": "Crawl4AI-MCP-Server", "version": "1.0.0", "status": "operational", "capabilities": ["web_crawling", "content_extraction", "screenshot_capture", "schema_based_extraction"] }

2. get_page_structure

Purpose: Extract webpage content for analysis (the "eyes" function)
Parameters:

  • url (string): The webpage URL to analyze
  • format (string, optional): Output format - "html" or "markdown" (default: "html")

Example:

{ "url": "https://example.com", "format": "html" }

3. crawl_with_schema

Purpose: Precision data extraction using CSS selectors (the "hands" function)
Parameters:

  • url (string): The webpage URL to extract data from
  • extraction_schema (string): JSON string defining field names and CSS selectors

Example Schema:

{ "title": "h1", "description": "p.description", "price": ".price-value", "author": ".author-name", "tags": ".tag" }

Example Usage:

{ "url": "https://example.com/product", "extraction_schema": "{\"title\": \"h1\", \"price\": \".price\", \"description\": \"p\"}" }

4. take_screenshot

Purpose: Capture visual representation of webpage
Parameters:

  • url (string): The webpage URL to screenshot

Example:

{ "url": "https://example.com" }

Returns: Base64-encoded PNG image data with metadata

Integration with Claude Desktop

To use this server with Claude Desktop, add this configuration to your Claude Desktop settings:

{ "mcpServers": { "crawl4ai": { "command": "python3", "args": ["/path/to/crawl4ai-mcp/crawl4ai_mcp_server.py"], "env": {} } } }

Replace /path/to/crawl4ai-mcp/ with the actual path to your installation directory.

Error Handling

All tools include comprehensive error handling and return structured JSON responses:

{ "error": "Error description", "url": "https://example.com", "success": false }

Common error scenarios:

  • Invalid URL format
  • Network connectivity issues
  • Invalid extraction schemas
  • Screenshot capture failures

Development

Project Structure

crawl4ai-mcp/ ├── crawl4ai_mcp_server.py # Main server implementation ├── requirements.txt # Python dependencies ├── pyproject.toml # Project configuration ├── USAGE_EXAMPLES.md # Detailed usage examples └── README.md # This file

Dependencies

  • fastmcp: FastMCP framework for MCP server development
  • crawl4ai: Core web crawling and extraction library
  • pydantic: Data validation and parsing
  • playwright: Browser automation for screenshots

Testing

Run the linter to ensure code quality:

ruff check .

Test server startup:

python3 crawl4ai_mcp_server.py

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Test thoroughly with MCP Inspector
  5. Submit a pull request

License

This project is open source. See the LICENSE file for details.

Support

For issues and questions:

  1. Check the troubleshooting section in USAGE_EXAMPLES.md
  2. Test with MCP Inspector to isolate issues
  3. Verify all dependencies are correctly installed
  4. Ensure virtual environment is activated

Acknowledgments

  • Crawl4AI: Powerful web crawling and extraction capabilities
  • FastMCP: Streamlined MCP server development framework
  • Model Context Protocol: Standardized AI tool integration
-
security - not tested
F
license - not found
-
quality - not tested

remote-capable server

The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.

A Model Context Protocol server that provides web scraping capabilities, enabling AI to extract and analyze web content through page structure analysis, schema-based extraction, and screenshot capture.

  1. Features
    1. Architecture
      1. Installation
        1. Prerequisites
        2. Setup
      2. Usage
        1. Starting the Server
        2. Testing with MCP Inspector
      3. Available Tools
        1. 1. server_status
        2. 2. get_page_structure
        3. 3. crawl_with_schema
        4. 4. take_screenshot
      4. Integration with Claude Desktop
        1. Error Handling
          1. Development
            1. Project Structure
            2. Dependencies
            3. Testing
          2. Contributing
            1. License
              1. Support
                1. Acknowledgments

                  Related MCP Servers

                  • A
                    security
                    A
                    license
                    A
                    quality
                    A production-ready Model Context Protocol server that enables language models to leverage AI-powered web scraping capabilities, offering tools for transforming webpages to markdown, extracting structured data, and executing AI-powered web searches.
                    Last updated -
                    3
                    34
                    Python
                    MIT License
                    • Apple
                  • A
                    security
                    A
                    license
                    A
                    quality
                    A Model Context Protocol server enabling AI assistants to scrape web content with high accuracy and flexibility, supporting multiple scraping modes and content formatting options.
                    Last updated -
                    4
                    674
                    2
                    TypeScript
                    MIT License
                    • Linux
                    • Apple
                  • A
                    security
                    F
                    license
                    A
                    quality
                    A Model Context Protocol server that intelligently fetches and processes web content, transforming websites and documentation into clean, structured markdown with nested URL crawling capabilities.
                    Last updated -
                    2
                    675
                    3
                    TypeScript
                  • -
                    security
                    A
                    license
                    -
                    quality
                    A Model Context Protocol server that enables web scraping, crawling, and content extraction capabilities through integration with Firecrawl.
                    Last updated -
                    19,244
                    JavaScript
                    MIT License
                    • Apple

                  View all related MCP servers

                  MCP directory API

                  We provide all the information about MCP servers via our MCP API.

                  curl -X GET 'https://glama.ai/api/mcp/v1/servers/Nexus-Digital-Automations/crawl4ai-mcp'

                  If you have feedback or need assistance with the MCP directory API, please join our Discord server