Crawl4AI RAG MCP Server
Generates embeddings to enable semantic similarity search across crawled content.
Provides vector storage and semantic search for crawled content, enabling retrieval-augmented generation (RAG) workflows.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Crawl4AI RAG MCP ServerSmart crawl https://example.com and store the content"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
A powerful implementation of the Model Context Protocol (MCP) integrated with Crawl4AI and Supabase for providing AI agents and AI coding assistants with advanced web crawling and RAG capabilities.
With this MCP server, you can scrape anything and then use that knowledge anywhere for RAG.
Overview
This MCP server provides tools that enable AI agents to crawl websites, store content in a vector database (Supabase), and perform RAG over the crawled content.
Features
Smart URL Detection: Automatically detects and handles different URL types (regular webpages, sitemaps, text files)
Recursive Crawling: Follows internal links to discover content
Parallel Processing: Efficiently crawls multiple pages simultaneously
Content Chunking: Intelligently splits content by headers and size for better processing
Vector Search: Performs RAG over crawled content, optionally filtering by data source for precision
Source Retrieval: Retrieve sources available for filtering to guide the RAG process
Tools
The server provides four essential web crawling and search tools:
crawl_single_page: Quickly crawl a single web page and store its content in the vector databasesmart_crawl_url: Intelligently crawl a full website based on the type of URL provided (sitemap, llms-full.txt, or a regular webpage that needs to be crawled recursively)get_available_sources: Get a list of all available sources (domains) in the databaseperform_rag_query: Search for relevant content using semantic search with optional source filtering
Prerequisites
Docker/Docker Desktop if running the MCP server as a container (recommended)
Python 3.12+ if running the MCP server directly through uv
Supabase (database for RAG)
OpenAI API key (for generating embeddings)
Installation
Using Docker (Recommended)
Clone this repository:
git clone https://github.com/coleam00/mcp-crawl4ai-rag.git cd mcp-crawl4ai-ragBuild the Docker image:
docker build -t mcp/crawl4ai-rag --build-arg PORT=8051 .Create a
.envfile based on the configuration section below
Using uv directly (no Docker)
Clone this repository:
git clone https://github.com/coleam00/mcp-crawl4ai-rag.git cd mcp-crawl4ai-ragInstall uv if you don't have it:
pip install uvCreate and activate a virtual environment:
uv venv .venv\Scripts\activate # on Mac/Linux: source .venv/bin/activateInstall dependencies:
uv pip install -e . crawl4ai-setupCreate a
.envfile based on the configuration section below
Running Supabase Locally with Docker (optional)
To run Supabase locally using Docker, follow these steps:
Get the Supabase code:
git clone --depth 1 https://github.com/supabase/supabaseCreate your new Supabase project directory:
mkdir supabase-projectCopy the compose files to your project:
cp -rf supabase/docker/* supabase-projectCopy the fake environment variables:
cp supabase/docker/.env.example supabase-project/.envSwitch to your project directory:
cd supabase-projectPull the latest images:
docker compose pullStart the services (in detached mode):
docker compose up -d
After starting Supabase locally, ensure you configure your .env file in this project with the correct SUPABASE_URL and SUPABASE_SERVICE_KEY pointing to your local Supabase instance. Typically, for a local setup, these would be:
Database Setup
Before running the server, you need to set up the database with the pgvector extension:
Go to the SQL Editor in your Supabase dashboard (create a new project first if necessary)
Create a new query and paste the contents of
crawled_pages.sqlRun the query to create the necessary tables and functions
Configuration
Create a .env file in the project root with the following variables:
# MCP Server Configuration
HOST=0.0.0.0
PORT=8051
TRANSPORT=sse
# OpenAI API Configuration
OPENAI_API_KEY=your_openai_api_key
# Supabase Configuration
SUPABASE_URL=your_supabase_project_url
SUPABASE_SERVICE_KEY=your_supabase_service_key
#local supbase config
SUPABASE_URL=your_local_supbase_url
SUPABASE_SERVICE_KEY=yuut_local_supbase_service_keyRunning the Server
Using Docker
docker run --env-file .env -p 8051:8051 mcp/crawl4ai-ragUsing Python
uv run src/crawl4ai_mcp.pyThe server will start and listen on the configured host and port.
Integration with MCP Clients
SSE Configuration
Once you have the server running with SSE transport, you can connect to it using this configuration:
{
"mcpServers": {
"crawl4ai-rag": {
"transport": "sse",
"url": "http://localhost:8051/sse"
}
}
}Note for Windsurf users: Use
serverUrlinstead ofurlin your configuration:{ "mcpServers": { "crawl4ai-rag": { "transport": "sse", "serverUrl": "http://localhost:8051/sse" } } }Note for Docker users: Use
host.docker.internalinstead oflocalhostif your client is running in a different container. This will apply if you are using this MCP server within n8n!
Stdio Configuration
Add this server to your MCP configuration for Claude Desktop, Windsurf, or any other MCP client:
{
"mcpServers": {
"crawl4ai-rag": {
"command": "python",
"args": ["path/to/crawl4ai-mcp/src/crawl4ai_mcp.py"],
"env": {
"TRANSPORT": "stdio",
"OPENAI_API_KEY": "your_openai_api_key",
"SUPABASE_URL": "your_supabase_url",
"SUPABASE_SERVICE_KEY": "your_supabase_service_key"
}
}
}
}Docker with Stdio Configuration
{
"mcpServers": {
"crawl4ai-rag": {
"command": "docker",
"args": ["run", "--rm", "-i",
"-e", "TRANSPORT",
"-e", "OPENAI_API_KEY",
"-e", "SUPABASE_URL",
"-e", "SUPABASE_SERVICE_KEY",
"mcp/crawl4ai"],
"env": {
"TRANSPORT": "stdio",
"OPENAI_API_KEY": "your_openai_api_key",
"SUPABASE_URL": "your_supabase_url",
"SUPABASE_SERVICE_KEY": "your_supabase_service_key"
}
}
}
}Building Your Own Server
This implementation provides a foundation for building more complex MCP servers with web crawling capabilities. To build your own:
Add your own tools by creating methods with the
@mcp.tool()decoratorCreate your own lifespan function to add your own dependencies
Modify the
utils.pyfile for any helper functions you needExtend the crawling capabilities by adding more specialized crawlers
This server cannot be installed
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/Anshumaan031/Web-knowledge-Crawler'
If you have feedback or need assistance with the MCP directory API, please join our Discord server