Crawl4AI RAG MCP Server

API_README.md•7.62 KiB

# Crawl4AI RAG API - Bidirectional MCP Communication This implementation adds a REST API layer that enables bidirectional communication between local MCP servers and remote deployments. The system can operate in two modes: - **Server Mode**: Hosts REST API endpoints for remote clients - **Client Mode**: Forwards MCP tool calls to a remote REST API server ## Quick Start ### 1. Install Dependencies ```bash pip install -r requirements.txt ``` ### 2. Configure Environment Copy and edit the `.env` file: ```bash # For Server Mode (hosting the API) IS_SERVER=true LOCAL_API_KEY=your-secure-api-key-here # For Client Mode (forwarding to remote) IS_SERVER=false REMOTE_API_URL=https://your-server.com:8080 REMOTE_API_KEY=your-remote-api-key-here ``` ### 3. Run in Server Mode ```bash # Option 1: Using the startup script python3 start_api_server.py # Option 2: Using uvicorn directly uvicorn api.api:create_app --host 0.0.0.0 --port 8080 ``` ### 4. Run in Client Mode ```bash # The existing MCP server automatically detects client mode python3 core/rag_processor.py ``` ## Configuration Variables ### Core Settings - `IS_SERVER`: `true` for server mode, `false` for client mode - `SERVER_HOST`: Host to bind API server (default: `0.0.0.0`) - `SERVER_PORT`: Port for API server (default: `8080`) ### Authentication - `LOCAL_API_KEY`: API key for server mode authentication - `REMOTE_API_KEY`: API key for client mode remote requests - `REMOTE_API_URL`: Remote server URL for client mode ### System Configuration - `DB_PATH`: SQLite database path (default: `crawl4ai_rag.db`) - `CRAWL4AI_URL`: Crawl4AI service URL (default: `http://localhost:11235`) - `RATE_LIMIT_PER_MINUTE`: API rate limit (default: `60`) ## REST API Endpoints ### Crawling Endpoints #### POST /api/v1/crawl Crawl a URL without storing the content. ```bash curl -X POST "http://localhost:8080/api/v1/crawl" \ -H "Authorization: Bearer your-api-key" \ -H "Content-Type: application/json" \ -d '{"url": "https://example.com"}' ``` #### POST /api/v1/crawl/store Crawl and store content permanently. ```bash curl -X POST "http://localhost:8080/api/v1/crawl/store" \ -H "Authorization: Bearer your-api-key" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com", "tags": "example,documentation", "retention_policy": "permanent" }' ``` #### POST /api/v1/crawl/temp Crawl and store content temporarily (session only). #### POST /api/v1/crawl/deep Deep crawl multiple pages without storing. ```bash curl -X POST "http://localhost:8080/api/v1/crawl/deep" \ -H "Authorization: Bearer your-api-key" \ -H "Content-Type: application/json" \ -d '{ "url": "https://docs.example.com", "max_depth": 3, "max_pages": 50, "include_external": false }' ``` #### POST /api/v1/crawl/deep/store Deep crawl and store all discovered pages. ### Knowledge Base Endpoints #### POST /api/v1/search Search stored knowledge using semantic similarity. ```bash curl -X POST "http://localhost:8080/api/v1/search" \ -H "Authorization: Bearer your-api-key" \ -H "Content-Type: application/json" \ -d '{ "query": "machine learning algorithms", "limit": 10 }' ``` #### GET /api/v1/memory List all stored content. ```bash curl -X GET "http://localhost:8080/api/v1/memory?filter=permanent" \ -H "Authorization: Bearer your-api-key" ``` #### DELETE /api/v1/memory Remove specific content by URL. ```bash curl -X DELETE "http://localhost:8080/api/v1/memory" \ -H "Authorization: Bearer your-api-key" \ -H "Content-Type: application/json" \ -d '{"url": "https://example.com"}' ``` #### DELETE /api/v1/memory/temp Clear all temporary content. ### System Endpoints #### GET /api/v1/status Get system status and health information. #### GET /health Basic health check endpoint. ## Architecture ### Server Mode Flow 1. Client sends HTTP request with API key 2. Authentication middleware validates API key 3. Request routed to appropriate handler 4. Handler calls existing crawler/storage functions 5. Response returned as JSON ### Client Mode Flow 1. LLM sends MCP tool call to local processor 2. Processor detects client mode from environment 3. MCP request transformed to REST API format 4. HTTP request sent to remote server with API key 5. API response transformed back to MCP format 6. Response returned to LLM ## Security Features - **API Key Authentication**: Bearer token authentication for all endpoints - **Rate Limiting**: Configurable requests per minute per API key - **Input Validation**: Pydantic models for request validation - **URL Validation**: Prevents access to internal/private networks - **HTTPS Support**: Configurable CORS and security headers - **Session Management**: Automatic cleanup of expired sessions ## Deployment ### Local Homelab Setup This system is designed for local homelab deployment and can be run entirely on your personal computer or home server without requiring cloud infrastructure. ### Docker Compose (Local Setup) ```yaml version: '3.8' services: crawl4ai-rag-api: build: . ports: - "8080:8080" environment: - IS_SERVER=true - LOCAL_API_KEY=your-secure-key - DB_PATH=/app/data/crawl4ai_rag.db volumes: - ./data:/app/data depends_on: - crawl4ai ``` ### Homelab Considerations 1. **Local Network Access**: The API server runs on your local network 2. **No Cloud Costs**: Everything runs locally with no recurring fees 3. **Security**: Configure firewall rules to restrict access to trusted devices 4. **Backup Strategy**: Regular backups of the SQLite database directory 5. **Resource Planning**: Ensure adequate RAM (minimum 4GB) and disk space (10GB+) ### Development Setup For development purposes, you can run the API server directly on your machine: ```bash # Install dependencies pip install -r requirements.txt # Run the server python3 start_api_server.py ``` ### Security for Homelab 1. **API Key Protection**: Store API keys securely in environment variables 2. **Network Security**: Restrict access to trusted IP addresses 3. **Regular Updates**: Keep dependencies updated for security patches 4. **Database Access**: Ensure database files have appropriate permissions ## Development ### Testing ```bash # Install development dependencies pip install -r requirements.txt # Run tests pytest tests/ # Code formatting black api/ core/ operations/ data/ # Linting flake8 api/ core/ operations/ data/ ``` ### API Documentation When the server is running, visit: - Swagger UI: `http://localhost:8080/docs` - ReDoc: `http://localhost:8080/redoc` ## Troubleshooting ### Common Issues **API Key Authentication Fails** - Verify `LOCAL_API_KEY` is set in `.env` - Ensure `Authorization: Bearer <key>` header is included **Client Mode Connection Fails** - Check `REMOTE_API_URL` and `REMOTE_API_KEY` in `.env` - Verify remote server is accessible - Test with curl directly to remote API **Database Errors** - Ensure SQLite database is writable - Check disk space and permissions - Verify sqlite-vec extension is properly installed **Rate Limiting** - Check `RATE_LIMIT_PER_MINUTE` setting - Wait before retrying requests - Consider increasing rate limit for your use case ### Logging API requests are logged to: - Console output (development) - `crawl4ai_api.log` (if configured) - `crawl4ai_rag_errors.log` (errors only) ## Migration from MCP-only Existing MCP clients continue to work without changes: 1. **No Changes Required**: Existing `core/rag_processor.py` usage unchanged 2. **Automatic Detection**: Client mode detected from `.env` configuration 3. **Backward Compatible**: MCP protocol remains identical to clients 4. **Gradual Migration**: Can enable API access while maintaining MCP functionality

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Rob-P-Smith/mcpragcrawl4ai'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

API_README.md•7.62 KiB