Crawl4AI MCP Server

CONFIG.md•14.4 KiB

# CONFIG.md ## Environment Variables ### MCP Server Core Configuration Core MCP (Model Context Protocol) server settings for transport and logging. ```bash # MCP Transport Protocol MCP_TRANSPORT=stdio # stdio (default) | http | websocket MCP_HOST=0.0.0.0 # HTTP transport host (if http enabled) MCP_PORT=8000 # HTTP transport port (if http enabled) # Python Environment FASTMCP_LOG_LEVEL=INFO # DEBUG | INFO | WARNING | ERROR PYTHONPATH=/app # Container application path PYTHONUNBUFFERED=1 # Force unbuffered output for containers ``` ### AI Model Configuration (Optional) LLM API keys for AI-powered extraction features. All features work without these keys. ```bash # OpenAI Configuration OPENAI_API_KEY=sk-proj-your-key-here # Get from: https://platform.openai.com/api-keys OPENAI_BASE_URL=https://api.openai.com/v1 # Custom endpoint (optional) # Anthropic Configuration ANTHROPIC_API_KEY=sk-ant-your-key-here # Get from: https://console.anthropic.com/ # Azure OpenAI Configuration AZURE_OPENAI_API_KEY=your-azure-key-here AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/ AZURE_OPENAI_API_VERSION=2024-02-15-preview # Google/Gemini Configuration GOOGLE_API_KEY=your-google-api-key-here ``` ### Performance & Resource Limits Configure resource usage and performance parameters. ```bash # File Processing Limits MAX_FILE_SIZE_MB=100 # Maximum file size for processing MAX_CONCURRENT_REQUESTS=5 # Concurrent request limit (3 for CPU-only) CACHE_TTL=900 # Cache time-to-live in seconds (15 minutes) # Browser Automation PLAYWRIGHT_TIMEOUT=60 # Browser operation timeout in seconds PLAYWRIGHT_HEADLESS=true # Run browser in headless mode # Memory Management CRAWL4AI_CACHE_SIZE=1000 # Number of cached pages CRAWL4AI_CACHE_TTL=3600 # Cache expiration in seconds ``` ### CPU-Only Optimization Settings Environment variables for disabling GPU/ML features in CPU-optimized containers. ```bash # Disable Heavy ML Features (CPU-only containers) CRAWL4AI_DISABLE_SENTENCE_TRANSFORMERS=true # Disable sentence transformers CRAWL4AI_DISABLE_TORCH=true # Disable PyTorch features CUDA_VISIBLE_DEVICES="" # Block CUDA device access # Alternative: Force CPU-only mode TORCH_DEVICE=cpu # Force CPU for PyTorch operations ``` ### Feature Toggles Enable or disable specific MCP tools and features. ```bash # Tool Availability ENABLE_FILE_PROCESSING=true # PDF, Office, ZIP file processing ENABLE_YOUTUBE_TRANSCRIPTS=true # YouTube transcript extraction ENABLE_GOOGLE_SEARCH=true # Google search integration ENABLE_BATCH_PROCESSING=true # Batch operation support # Content Processing Features ENABLE_INTELLIGENT_EXTRACTION=true # AI-powered content extraction ENABLE_ENTITY_EXTRACTION=true # Named entity recognition ENABLE_STRUCTURED_EXTRACTION=true # JSON schema extraction ``` ### Security Configuration Security-related settings for safe operation. ```bash # Content Safety SAFE_MODE=true # Enable content filtering BLOCK_PRIVATE_IPS=true # Block private IP addresses ALLOWED_DOMAINS="" # Comma-separated allowed domains (empty = all) BLOCKED_DOMAINS="" # Comma-separated blocked domains # Rate Limiting RATE_LIMIT_ENABLED=true # Enable rate limiting RATE_LIMIT_REQUESTS=100 # Requests per window RATE_LIMIT_WINDOW=3600 # Rate limit window in seconds (1 hour) # Network Security USER_AGENT="Crawl4AI-MCP/1.0" # Custom User-Agent string RESPECT_ROBOTS_TXT=true # Respect robots.txt files ``` ## Docker Environment Configuration ### Container-Specific Variables Variables specific to Docker deployment with different optimization levels. #### Standard Container (`Dockerfile`) ```bash # Full-featured container with potential CUDA dependencies MCP_TRANSPORT=stdio FASTMCP_LOG_LEVEL=INFO MAX_CONCURRENT_REQUESTS=5 MEMORY_LIMIT=2G CPU_LIMIT=1.0 ``` #### CPU-Optimized Container (`Dockerfile.cpu`) ```bash # Optimized for VPS deployment without GPU CRAWL4AI_DISABLE_SENTENCE_TRANSFORMERS=true CRAWL4AI_DISABLE_TORCH=true MAX_CONCURRENT_REQUESTS=3 MEMORY_LIMIT=1G CPU_LIMIT=0.5 ``` #### Lightweight Container (`Dockerfile.lightweight`) ```bash # Minimal footprint container CRAWL4AI_DISABLE_SENTENCE_TRANSFORMERS=true CRAWL4AI_DISABLE_TORCH=true CUDA_VISIBLE_DEVICES="" TORCH_DEVICE=cpu MAX_CONCURRENT_REQUESTS=2 MEMORY_LIMIT=512M CPU_LIMIT=0.25 ``` #### RunPod Serverless (`Dockerfile.runpod`) ```bash # RunPod serverless optimization CRAWL4AI_DISABLE_SENTENCE_TRANSFORMERS=true PLATFORM=linux/amd64 AUTO_SCALING=true MEMORY_LIMIT=1G ``` ### Docker Compose Configuration Environment precedence and inheritance in Docker Compose deployments. #### Standard Deployment (`docker-compose.yml`) ```yaml environment: - FASTMCP_LOG_LEVEL=INFO - PYTHONPATH=/app - PYTHONUNBUFFERED=1 - MAX_FILE_SIZE_MB=100 - MAX_CONCURRENT_REQUESTS=5 - CACHE_TTL=900 ``` #### CPU-Optimized Deployment (`docker-compose.cpu.yml`) ```yaml environment: - FASTMCP_LOG_LEVEL=INFO - PYTHONPATH=/app - PYTHONUNBUFFERED=1 - CRAWL4AI_DISABLE_SENTENCE_TRANSFORMERS=true - CRAWL4AI_DISABLE_TORCH=true - MAX_CONCURRENT_REQUESTS=3 - CACHE_TTL=900 ``` ### Environment File Management Configuration file hierarchy and precedence rules. #### `.env` File Priority 1. **Container environment variables** (highest priority) 2. **Docker Compose environment section** 3. **`.env` file in project root** 4. **Default values in code** (lowest priority) #### Environment File Example ```bash # Copy .env.example to .env and customize cp .env.example .env # Key variables to customize: FASTMCP_LOG_LEVEL=DEBUG # For development MCP_TRANSPORT=http # For HTTP API access OPENAI_API_KEY=sk-proj-your-key # For AI features MAX_CONCURRENT_REQUESTS=10 # For high-traffic deployment ``` ## Claude Desktop Integration ### MCP Client Configuration Configuration for Claude Desktop and other MCP clients. #### STDIO Transport (Default) ```json { "mcpServers": { "crawl4ai": { "command": "docker", "args": [ "exec", "-i", "crawl4ai-mcp-server", "python", "-m", "crawl4ai_mcp.server" ], "env": { "FASTMCP_LOG_LEVEL": "INFO" } } } } ``` #### HTTP Transport ```json { "mcpServers": { "crawl4ai-http": { "command": "node", "args": ["-e", "require('http').get('http://crawl4ai-mcp-server:8000/mcp')"], "env": { "MCP_TRANSPORT": "http" } } } } ``` ### Transport Protocol Configuration Configure different transport methods for various deployment scenarios. | Transport | Use Case | Configuration | Performance | |-----------|----------|---------------|-------------| | **STDIO** | Claude Desktop, local clients | Default, no network config | Fastest | | **HTTP** | Web integration, REST API | `MCP_TRANSPORT=http` + port | Good | | **WebSocket** | Real-time applications | `MCP_TRANSPORT=websocket` | Good | ## GitHub Actions CI/CD Configuration ### Build Environment Variables Variables used in automated builds and deployments. #### Automated Docker Builds (`.github/workflows/build-runpod-docker.yml`) ```yaml env: REGISTRY: docker.io IMAGE_NAME: gemneye/crawl4ai-runpod-serverless PLATFORMS: linux/amd64 secrets: DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }} DOCKER_PASSWORD: ${{ secrets.DOCKER_PASSWORD }} ``` #### Build Matrix Configuration ```yaml strategy: matrix: container_variant: [standard, cpu, lightweight, runpod] include: - container_variant: cpu dockerfile: Dockerfile.cpu memory_limit: 1G - container_variant: lightweight dockerfile: Dockerfile.lightweight memory_limit: 512M ``` ### Deployment Configuration Environment-specific deployment configurations. #### Development Environment ```bash FASTMCP_LOG_LEVEL=DEBUG MCP_TRANSPORT=http SAFE_MODE=false ENABLE_FILE_PROCESSING=true CACHE_TTL=300 # 5 minutes for faster testing ``` #### Staging Environment ```bash FASTMCP_LOG_LEVEL=INFO MCP_TRANSPORT=stdio SAFE_MODE=true ENABLE_FILE_PROCESSING=true CACHE_TTL=900 # 15 minutes ``` #### Production Environment ```bash FASTMCP_LOG_LEVEL=WARNING MCP_TRANSPORT=stdio SAFE_MODE=true ENABLE_FILE_PROCESSING=true CACHE_TTL=3600 # 1 hour RATE_LIMIT_ENABLED=true ``` ## Performance Tuning ### Resource Optimization Matrix Configure resources based on deployment scenario and expected load. | Scenario | Memory | CPU | Concurrent | Cache TTL | Use Case | |----------|--------|-----|------------|-----------|----------| | **Development** | 512MB | 0.25 | 2 | 300s | Local testing | | **VPS (CPU-only)** | 1GB | 0.5 | 3 | 900s | Production deployment | | **Standard** | 2GB | 1.0 | 5 | 900s | Full features | | **High Traffic** | 4GB | 2.0 | 10 | 3600s | Heavy usage | | **RunPod** | 1GB | Auto | 5 | 900s | Serverless scaling | ### Browser Performance Configuration Playwright browser automation performance settings. ```bash # Browser Resource Limits PLAYWRIGHT_TIMEOUT=60 # Total operation timeout PLAYWRIGHT_NAVIGATION_TIMEOUT=30 # Page navigation timeout PLAYWRIGHT_WAIT_TIMEOUT=10 # Element wait timeout # Browser Pool Management BROWSER_POOL_SIZE=3 # Number of browser instances BROWSER_REUSE_PAGES=true # Reuse browser pages BROWSER_HEADLESS=true # Headless mode for performance # Memory Management BROWSER_MAX_PAGES=10 # Max pages per browser instance BROWSER_PAGE_TIMEOUT=300 # Page idle timeout in seconds ``` ### Cache Configuration Multi-layer caching strategy for optimal performance. ```bash # Memory Cache (fastest) MEMORY_CACHE_SIZE=100 # Number of items in memory cache MEMORY_CACHE_TTL=300 # Memory cache TTL in seconds # Disk Cache (persistent) DISK_CACHE_SIZE=1000 # Number of items in disk cache DISK_CACHE_TTL=3600 # Disk cache TTL in seconds DISK_CACHE_PATH=/app/cache # Cache directory path # Redis Cache (distributed) REDIS_URL=redis://localhost:6379 # Redis connection URL REDIS_CACHE_TTL=1800 # Redis cache TTL in seconds REDIS_KEY_PREFIX=crawl4ai: # Key prefix for namespacing ``` ## Configuration Validation ### Startup Validation Automated validation of configuration values during server startup. ```bash # Enable configuration validation VALIDATE_CONFIG=true # Validate config on startup STRICT_VALIDATION=false # Fail on warnings (false for warnings only) # Validation Rules MIN_MEMORY_MB=256 # Minimum required memory MIN_DISK_SPACE_MB=1000 # Minimum required disk space REQUIRED_VARS="FASTMCP_LOG_LEVEL" # Comma-separated required variables ``` ### Health Check Configuration Container health monitoring and status reporting. ```bash # Health Check Settings HEALTH_CHECK_ENABLED=true # Enable container health checks HEALTH_CHECK_INTERVAL=30 # Check interval in seconds HEALTH_CHECK_TIMEOUT=10 # Check timeout in seconds HEALTH_CHECK_RETRIES=3 # Number of retries before unhealthy # Health Check Endpoints (HTTP mode) HEALTH_ENDPOINT=/health # Health check endpoint path METRICS_ENDPOINT=/metrics # Metrics endpoint path STATUS_ENDPOINT=/status # Status endpoint path ``` ## Security Best Practices ### API Key Management Secure handling of API keys and sensitive configuration. ```bash # Development (local only) OPENAI_API_KEY=sk-proj-dev-key # Direct key for development # Production (recommended) OPENAI_API_KEY_FILE=/run/secrets/openai_key # Docker secret file ANTHROPIC_API_KEY_FILE=/run/secrets/anthropic_key # Environment-based (staging) export OPENAI_API_KEY=$(vault kv get -field=key secret/openai) ``` ### Network Security Configuration Secure network access and content filtering. ```bash # Network Access Control ALLOWED_PROTOCOLS=http,https # Allowed URL protocols BLOCKED_TLD=.onion,.bit # Blocked top-level domains MAX_REDIRECT_HOPS=5 # Maximum redirect following DNS_TIMEOUT=10 # DNS resolution timeout # Content Security Policy MAX_RESPONSE_SIZE_MB=50 # Maximum response size BLOCK_EXECUTABLE_CONTENT=true # Block executable downloads SCAN_CONTENT_FOR_MALWARE=false # Enable malware scanning (if available) ``` ## Troubleshooting Configuration ### Common Configuration Issues **Issue**: Container fails to start with "Permission denied" **Solution**: Check user permissions and file ownership ```bash # Fix ownership issues docker-compose exec crawl4ai-mcp-server chown -R 1000:1000 /app ``` **Issue**: Out of memory errors during processing **Solution**: Increase memory limits or reduce concurrent requests ```bash # Increase memory limit in docker-compose.yml deploy: resources: limits: memory: 4G # Increase from 2G ``` **Issue**: API keys not being recognized **Solution**: Verify environment variable names and values ```bash # Debug environment variables docker-compose exec crawl4ai-mcp-server printenv | grep API_KEY ``` **Issue**: Slow performance with large files **Solution**: Adjust timeout and concurrency settings ```bash MAX_FILE_SIZE_MB=50 # Reduce from 100MB MAX_CONCURRENT_REQUESTS=3 # Reduce from 5 PLAYWRIGHT_TIMEOUT=120 # Increase from 60s ``` ### Configuration Debugging Enable detailed logging and diagnostics for configuration issues. ```bash # Debug Configuration FASTMCP_LOG_LEVEL=DEBUG # Enable debug logging CONFIG_DEBUG=true # Log configuration loading VALIDATE_CONFIG=true # Enable startup validation TRACE_REQUESTS=true # Log all requests # Performance Debugging PROFILE_REQUESTS=true # Enable request profiling LOG_MEMORY_USAGE=true # Log memory usage LOG_PERFORMANCE_METRICS=true # Log performance metrics ``` ## Keywords  environment variables, configuration, Docker, MCP server, FastMCP, performance tuning, security, API keys, transport protocols, container optimization, CPU-only, cache configuration, health checks, troubleshooting, Claude Desktop, GitHub Actions, deployment, validation

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/sruckh/crawl-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

CONFIG.md•14.4 KiB