Integrations
Serves as the deployment platform for the MCP server, enabling the service to run on CloudFlare's edge network
⚠️ NOTICE
MCP SERVER CURRENTLY UNDER DEVELOPMENT
NOT READY FOR PRODUCTION USE
WILL UPDATE WHEN OPERATIONAL
Crawl4AI MCP Server
🚀 High-performance MCP Server for Crawl4AI - Enable AI assistants to access web scraping, crawling, and deep research via Model Context Protocol. Faster and more efficient than FireCrawl!
Overview
This project implements a custom Model Context Protocol (MCP) Server that integrates with Crawl4AI, an open-source web scraping and crawling library. The server is deployed as a remote MCP server on CloudFlare Workers, allowing AI assistants like Claude to access Crawl4AI's powerful web scraping capabilities.
Documentation
For comprehensive details about this project, please refer to the following documentation:
- Migration Plan - Detailed plan for migrating from Firecrawl to Crawl4AI
- Enhanced Architecture - Multi-tenant architecture with cloud provider flexibility
- Implementation Guide - Technical implementation details and code examples
- Codebase Simplification - Details on code simplification and best practices implemented
- Docker Setup Guide - Instructions for Docker setup for local development and production
Features
Web Data Acquisition
- 🌐 Single Webpage Scraping: Extract content from individual webpages
- 🕸️ Web Crawling: Crawl websites with configurable depth and page limits
- 🗺️ URL Discovery: Map and discover URLs from a starting point
- 🕸️ Asynchronous Crawling: Crawl entire websites efficiently
Content Processing
- 🔍 Deep Research: Conduct comprehensive research across multiple pages
- 📊 Structured Data Extraction: Extract specific data using CSS selectors or LLM-based extraction
- 🔎 Content Search: Search through previously crawled content
Integration & Security
- 🔄 MCP Integration: Seamless integration with MCP clients (Claude Desktop, etc.)
- 🔒 OAuth Authentication: Secure access with proper authorization
- 🔒 Authentication Options: Secure access via OAuth or API key (Bearer token)
- ⚡ High Performance: Optimized for speed and efficiency
Project Structure
Getting Started
Prerequisites
Installation
- Clone the repository:Copy
- Install dependencies:Copy
- Set up CloudFlare KV namespace:Copy
- Update
wrangler.toml
with the KV namespace ID:Copy
Development
Local Development
Using NPM
- Start the development server:Copy
- The server will be available at http://localhost:8787
Using Docker
You can also use Docker for local development, which includes the Crawl4AI API and a debug UI:
- Set up environment variables:Copy
- Start the Docker development environment:Copy
- Access the services:
- MCP Server: http://localhost:8787
- Crawl4AI UI: http://localhost:3000
See the Docker Setup Guide for more details.
Testing
The project includes a comprehensive test suite using Jest. To run tests:
When running in Docker:
Deployment
- Deploy to CloudFlare Workers:Copy
- Your server will be available at the CloudFlare Workers URL assigned to your deployed worker.
Usage with MCP Clients
This server implements the Model Context Protocol, allowing AI assistants to access its tools.
Authentication
- Implement OAuth authentication with workers-oauth-provider
- Add API key authentication using Bearer tokens
- Create login page and token management
Connecting to an MCP Client
- Use the CloudFlare Workers URL assigned to your deployed worker
- In Claude Desktop or other MCP clients, add this server as a tool source
Available Tools
crawl
: Crawl web pages from a starting URLgetCrawl
: Retrieve crawl data by IDlistCrawls
: List all crawls or filter by domainsearch
: Search indexed documents by queryextract
: Extract structured content from a URL
Configuration
The server can be configured by modifying environment variables in wrangler.toml
:
MAX_CRAWL_DEPTH
: Maximum depth for web crawling (default: 3)MAX_CRAWL_PAGES
: Maximum pages to crawl (default: 100)API_VERSION
: API version string (default: "v1")OAUTH_CLIENT_ID
: OAuth client ID for authenticationOAUTH_CLIENT_SECRET
: OAuth client secret for authentication
Roadmap
The project is being developed with these components in mind:
- Project Setup and Configuration: CloudFlare Worker setup, TypeScript configuration
- MCP Server and Tool Schemas: Implementation of MCP server with tool definitions
- Crawl4AI Adapter: Integration with the Crawl4AI functionality
- OAuth Authentication: Secure authentication implementation
- Performance Optimizations: Enhancing speed and reliability
- Advanced Extraction Features: Improving structured data extraction capabilities
Contributing
Contributions are welcome! Please check the open issues or create a new one before starting work on a feature or bug fix. See Contributing Guidelines for detailed guidelines.
Support
If you encounter issues or have questions:
- Open an issue on the GitHub repository
- Check the Crawl4AI documentation
- Refer to the Model Context Protocol specification
How to Cite
If you use Crawl4AI MCP Server in your research or projects, please cite it using the following BibTeX entry:
License
This server cannot be installed
remote-capable server
The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.
High-performance server enabling AI assistants to access web scraping, crawling, and deep research capabilities through Model Context Protocol.
Related MCP Servers
- AsecurityAlicenseAqualityA production-ready Model Context Protocol server that enables language models to leverage AI-powered web scraping capabilities, offering tools for transforming webpages to markdown, extracting structured data, and executing AI-powered web searches.Last updated -316PythonMIT License
- -securityAlicense-qualityA Model Context Protocol server that enables AI assistants to perform advanced web scraping, crawling, searching, and data extraction through the Firecrawl API.Last updated -15,275MIT License
- AsecurityFlicenseAqualityA Model Context Protocol server that enables AI assistants to perform real-time web searches, retrieving up-to-date information from the internet via a Crawler API.Last updated -1448JavaScript
- -securityAlicense-qualityA Model Context Protocol server that provides real-time web search capabilities to AI assistants through pluggable search providers, currently integrated with the Brave Search API.Last updated -3TypeScriptMIT License