Crawl4AI RAG MCP Server

index.md•6.66 KiB

# API Reference This documentation covers the REST API layer that enables bidirectional communication between local MCP servers and remote deployments. ## Overview The Crawl4AI RAG MCP Server provides a comprehensive REST API with 18 endpoints for web crawling, knowledge management, and system administration. The system can operate in two modes: - **Server Mode**: Hosts REST API endpoints for remote clients - **Client Mode**: Forwards MCP tool calls to a remote REST API server All endpoints require authentication using an API key via the Authorization header with Bearer token format. ## Key Features ### High-Performance RAM Database - **10-50x faster** than traditional disk-based operations - Intelligent differential synchronization to disk - Automatic idle and periodic sync - Check RAM database status with the `using_ram_db` flag in `/api/v1/stats` - Monitor sync health via `/api/v1/db/stats` - See [RAM Database Mode](../advanced/ram-database.md) for details ### Advanced Security - **SQL injection protection** with pattern detection and keyword filtering - **Domain blocking** system with wildcard pattern support (e.g., `*.ru`, `*spam*`) - **Input sanitization** for all user-provided data - **Adult content filtering** to prevent inappropriate URL crawling - See [Security Documentation](../advanced/security.md) for details ### Batch Operations - Deep crawling with DFS strategy (up to 250 pages) - Intelligent search with tag discovery - Bulk memory management - See [Batch Operations](../advanced/batch-operations.md) for details ## API Endpoints Summary For detailed endpoint documentation with request/response examples, see [API Endpoints](endpoints.md). ### System Endpoints (3) - `GET /health` - Basic health check endpoint (no auth required) - `GET /api/v1/status` - Get comprehensive system status and health - `GET /api/v1/help` - Get help documentation for all available tools ### Crawling Endpoints (4) - `POST /api/v1/crawl` - Crawl a URL without storing content - `POST /api/v1/crawl/store` - Crawl and store content permanently - `POST /api/v1/crawl/temp` - Crawl and store content temporarily (session only) - `POST /api/v1/crawl/deep/store` - Deep crawl and store all discovered pages (DFS) ### Search Endpoints (2) - `POST /api/v1/search` - Search stored knowledge using semantic similarity - `POST /api/v1/search/target` - Intelligent search with automatic tag discovery and expansion ### Knowledge Base Management (3) - `GET /api/v1/memory` - List all stored content with filtering options - `DELETE /api/v1/memory` - Remove specific content by URL - `DELETE /api/v1/memory/temp` - Clear all temporary session content ### Database Statistics (3) - `GET /api/v1/stats` - Get comprehensive database statistics and storage breakdown - `GET /api/v1/db/stats` - Get RAM database sync metrics and health status - `GET /api/v1/domains` - List all unique domains with page counts ### Domain Blocking (3) - `GET /api/v1/blocked-domains` - List all blocked domain patterns - `POST /api/v1/blocked-domains` - Add a domain pattern to the blocklist - `DELETE /api/v1/blocked-domains` - Remove a domain pattern (requires authorization keyword) ## Authentication All endpoints require authentication using an API key via the Authorization header: ```http Authorization: Bearer your-api-key ``` The API key is configured through environment variables: - `LOCAL_API_KEY`: For server mode authentication - `REMOTE_API_KEY`: For client mode remote requests ## Rate Limiting The system implements rate limiting to prevent abuse: - **Default limit**: 60 requests per minute per API key - **Configurable**: Set `RATE_LIMIT_PER_MINUTE` environment variable - **Response headers**: `X-RateLimit-Limit`, `X-RateLimit-Remaining`, `X-RateLimit-Reset` ## Error Handling All error responses follow a consistent format: ```json { "success": false, "error": "Error message description", "timestamp": "2024-01-15T14:30:00Z" } ``` ### Common Error Scenarios - **400 Bad Request**: Invalid input or validation error (e.g., SQL injection pattern detected, URL blocked) - **401 Unauthorized**: Missing or invalid API key - **404 Not Found**: Resource not found (e.g., URL not in memory, blocked domain pattern not found) - **429 Too Many Requests**: Rate limit exceeded - **500 Internal Server Error**: Server-side error ## Security Features ### Input Validation All user inputs are sanitized before processing to prevent: - SQL injection attacks via keyword detection and pattern matching - XSS attacks through HTML/JavaScript filtering - Buffer overflow via strict length limits - Invalid data types through type validation ### Domain Blocking The system maintains a database of blocked domain patterns: - **Wildcard TLD blocking**: `*.ru` blocks all .ru domains - **Keyword blocking**: `*spam*` blocks any URL containing "spam" - **Exact domain blocking**: `example.com` blocks that specific domain Default blocks include common adult content patterns and high-risk TLDs. Manage blocked domains via the `/api/v1/blocked-domains` endpoints. ### Authorization Sensitive operations require additional authorization: - **Domain unblocking**: Requires `BLOCKED_DOMAIN_KEYWORD` environment variable - **API access**: All endpoints require valid Bearer token (except `/health`) ## Performance Considerations ### RAM Database Mode When `USE_MEMORY_DB=true`: - All database operations run in memory for maximum speed - Automatic differential sync to disk every 5 seconds after writes - Periodic sync every 5 minutes as backup - Check sync status via `/api/v1/db/stats` Performance benefits: - **Read operations**: 10-100x faster than disk - **Write operations**: 5-10x faster than disk - **Vector searches**: Near-instant results ### Monitoring Monitor system health and performance: - `/api/v1/status` - Component health status - `/api/v1/stats` - Database statistics, storage breakdown, retention policies - `/api/v1/db/stats` - RAM database sync metrics and health ## Advanced Features For detailed documentation on advanced features: - **[RAM Database Mode](../advanced/ram-database.md)** - High-performance in-memory database with automatic persistence - **[Security Layer](../advanced/security.md)** - Comprehensive input sanitization and domain blocking - **[Batch Operations](../advanced/batch-operations.md)** - Deep crawling, bulk operations, and advanced workflows ## Getting Started 1. **Quick Start**: See [Quick Start Guide](../guides/quick-start.md) for setup instructions 2. **Deployment**: See [Deployment Guide](../guides/deployment.md) for production deployment 3. **API Reference**: See [API Endpoints](endpoints.md) for detailed endpoint documentation 4. **Troubleshooting**: See [Troubleshooting Guide](../guides/troubleshooting.md) for common issues

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Rob-P-Smith/mcpragcrawl4ai'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

index.md•6.66 KiB