Crawl4AI MCP Server

Overview Schema Related Servers Score Discussions

crawl4ai-mcp-server
docs

IMPLEMENTATION_GUIDE.md•20.4 KiB

# Crawl4AI MCP Server Implementation Guide This technical guide provides detailed implementation considerations and best practices for developing the enhanced Crawl4AI MCP Server architecture. It serves as a companion to the `MIGRATION_PLAN.md` and `ENHANCED_ARCHITECTURE.md` documents. ## Technical Implementation Considerations ### 1. Adapter Implementation Details The `Crawl4AIAdapter` class needs significant refactoring to properly work with the Crawl4AI Docker API: #### Parameter Transformation Crawl4AI Docker API expects different parameter structures than the current Firecrawl implementation: ```typescript // Current Firecrawl format (flat structure): { "url": "https://example.com", "formats": ["markdown", "html"], "onlyMainContent": true } // Required Crawl4AI format (nested structure): { "urls": ["https://example.com"], "browser_config": { "type": "BrowserConfig", "params": {"headless": true} }, "crawler_config": { "type": "CrawlerRunConfig", "params": { "formats": ["markdown", "html"], "onlyMainContent": true } } } ``` Implementation example: ```typescript private transformScrapeParams(params: any): any { const { url, ...options } = params; return { urls: [url], browser_config: { type: "BrowserConfig", params: { headless: options.mobile ? false : true, // Transform other browser-specific options } }, crawler_config: { type: "CrawlerRunConfig", params: { formats: options.formats || ["markdown"], onlyMainContent: options.onlyMainContent || false, // Transform other crawler-specific options } } }; } ``` ### 2. Authentication Mechanisms Crawl4AI Docker server supports multiple authentication methods: #### Bearer Token Authentication ```typescript public setApiKey(apiKey: string): void { if (!apiKey) return; this.apiClient.defaults.headers.common['Authorization'] = `Bearer ${apiKey}`; } ``` #### JWT Authentication For JWT authentication, implement: ```typescript public async authenticate(email: string): Promise<string> { try { const response = await this.executeRequest( 'POST', '/auth', { email } ); if (response.token) { this.apiClient.defaults.headers.common['Authorization'] = `Bearer ${response.token}`; return response.token; } throw new Error('Authentication failed: No token received'); } catch (error) { console.error('Authentication error:', error); throw error; } } ``` ### 3. Error Handling and Retries Implement more sophisticated error handling specific to Crawl4AI: ```typescript // Update error classification based on Crawl4AI error patterns switch (status) { case 401: case 403: errorType = ErrorType.AUTHENTICATION; errorMessage = 'Authentication error: Invalid or expired API key'; break; case 429: errorType = ErrorType.RATE_LIMIT; errorMessage = 'Rate limit exceeded: Too many requests'; retryable = true; break; case 503: case 504: errorType = ErrorType.SERVER; errorMessage = 'Crawl4AI server temporarily unavailable'; retryable = true; break; // Additional Crawl4AI-specific error codes case 422: errorType = ErrorType.VALIDATION; errorMessage = `Validation error: ${errorData.message || 'Invalid parameters'}`; break; // etc. } ``` ### 4. WebSocket Support Crawl4AI Docker server supports WebSocket connections for streaming results: ```typescript public async streamCrawl(urls: string[], options: any = {}): Promise<WebSocket> { const params = this.transformCrawlParams({ urls, ...options, stream: true }); // Create WebSocket connection const wsUrl = `${this.baseUrl.replace('http', 'ws')}/crawl/stream`; const ws = new WebSocket(wsUrl); ws.onopen = () => { ws.send(JSON.stringify(params)); }; return ws; } ``` ## Infrastructure as Code Implementations ### AWS CloudFormation Template ```yaml AWSTemplateFormatVersion: '2010-09-09' Description: 'Crawl4AI Docker Server on AWS' Parameters: InstanceType: Type: String Default: t3.medium AllowedValues: - t3.medium - t3.large - t3.xlarge Resources: Crawl4AISecurityGroup: Type: AWS::EC2::SecurityGroup Properties: GroupDescription: Security group for Crawl4AI Docker server SecurityGroupIngress: - IpProtocol: tcp FromPort: 11235 ToPort: 11235 CidrIp: 0.0.0.0/0 # Should be restricted in production Crawl4AIInstance: Type: AWS::EC2::Instance Properties: InstanceType: !Ref InstanceType ImageId: ami-0261755bbcb8c4a84 # Amazon Linux 2 (update as needed) SecurityGroupIds: - !Ref Crawl4AISecurityGroup UserData: Fn::Base64: !Sub | #!/bin/bash yum update -y amazon-linux-extras install docker -y service docker start systemctl enable docker docker run -d -p 11235:11235 --name crawl4ai --shm-size=1g unclecode/crawl4ai:latest Outputs: ServerURL: Description: URL for the Crawl4AI server Value: !Sub http://${Crawl4AIInstance.PublicDnsName}:11235 ``` ### Terraform Module for Multi-Cloud ```hcl variable "cloud_provider" { description = "Cloud provider to use (aws, gcp, azure)" type = string default = "aws" } module "crawl4ai_server" { source = "./modules/${var.cloud_provider}" # Common variables instance_size = var.instance_size docker_version = var.docker_version # Provider-specific variables region = var.region vpc_id = var.vpc_id subnet_id = var.subnet_id } ``` ## User Management System ### User Model ```typescript interface User { id: string; email: string; name: string; apiKey: string; createdAt: Date; lastLoginAt: Date; preferences: { defaultProvider: string; instanceSize: string; autoShutdownMinutes: number; }; deployments: Deployment[]; } interface Deployment { id: string; provider: string; region: string; instanceType: string; status: 'creating' | 'running' | 'stopped' | 'error'; createdAt: Date; lastActiveAt: Date; endpoint: string; cost: { hourlyRate: number; monthlyCost: number; currency: string; }; } ``` ### User Registration Flow ```typescript async function registerUser(email: string, name: string, password: string): Promise<User> { // Validate email format if (!isValidEmail(email)) { throw new Error('Invalid email format'); } // Check if user already exists const existingUser = await UserStorage.findByEmail(email); if (existingUser) { throw new Error('User already exists'); } // Generate API key const apiKey = generateSecureApiKey(); // Create user record const user = { id: uuidv4(), email, name, apiKey, createdAt: new Date(), lastLoginAt: new Date(), preferences: { defaultProvider: 'aws', instanceSize: 'medium', autoShutdownMinutes: 60 }, deployments: [] }; // Store password securely (using bcrypt or similar) await UserStorage.storePassword(user.id, await hashPassword(password)); // Store user record await UserStorage.save(user); return user; } ``` ## Dashboard Implementation ### Frontend Components The user dashboard will need several key components: 1. **Authentication Module**: - Login/registration forms - Password reset functionality - Session management 2. **Provider Configuration**: - Cloud provider selection - Secure credential collection - Deployment options 3. **Management Interface**: - Instance status monitoring - Resource scaling controls - Usage statistics and billing information ### Backend API Endpoints ```typescript // User management app.post('/api/users/register', handleUserRegistration); app.post('/api/users/login', handleUserLogin); app.get('/api/users/me', requireAuth, handleGetCurrentUser); // Cloud provider management app.post('/api/providers/configure', requireAuth, handleProviderConfiguration); app.get('/api/providers/status', requireAuth, handleGetProviderStatus); // Deployment management app.post('/api/deployments', requireAuth, handleCreateDeployment); app.get('/api/deployments', requireAuth, handleListDeployments); app.get('/api/deployments/:id', requireAuth, handleGetDeployment); app.post('/api/deployments/:id/start', requireAuth, handleStartDeployment); app.post('/api/deployments/:id/stop', requireAuth, handleStopDeployment); app.delete('/api/deployments/:id', requireAuth, handleDeleteDeployment); // Monitoring and utilization app.get('/api/deployments/:id/metrics', requireAuth, handleGetDeploymentMetrics); app.get('/api/deployments/:id/logs', requireAuth, handleGetDeploymentLogs); app.get('/api/deployments/:id/billing', requireAuth, handleGetDeploymentBilling); ``` ## Secure Credential Management ### Storing Cloud Provider Credentials Never store cloud provider credentials in plain text. Use proper encryption and secure storage: ```typescript async function storeProviderCredentials( userId: string, provider: string, credentials: any ): Promise<void> { // Encrypt credentials before storage const encryptedCredentials = await encryptData( JSON.stringify(credentials), process.env.ENCRYPTION_KEY! ); // Store in secure storage (KV, database, etc.) await CredentialStorage.save(userId, provider, encryptedCredentials); // Optional: Store a key reference in the user record await UserStorage.updateProviderStatus(userId, provider, 'configured'); } ``` ### Using Credentials for Deployments ```typescript async function getProviderCredentials(userId: string, provider: string): Promise<any> { // Retrieve encrypted credentials const encryptedCredentials = await CredentialStorage.get(userId, provider); if (!encryptedCredentials) { throw new Error(`No ${provider} credentials found for user`); } // Decrypt credentials for use const credentials = JSON.parse( await decryptData(encryptedCredentials, process.env.ENCRYPTION_KEY!) ); return credentials; } ``` ## Monitoring and Observability ### Health Checks Implement regular health checks for Crawl4AI deployments: ```typescript async function checkDeploymentHealth(deploymentId: string): Promise<HealthStatus> { try { const deployment = await DeploymentStorage.get(deploymentId); if (!deployment) { throw new Error('Deployment not found'); } // Check if instance is responsive const response = await fetch(`${deployment.endpoint}/health`, { timeout: 5000 }); if (!response.ok) { return { status: 'unhealthy', message: `Server returned status ${response.status}`, lastChecked: new Date() }; } const healthData = await response.json(); return { status: 'healthy', message: 'Service is running normally', metrics: { memoryUsage: healthData.memory_usage, cpuUsage: healthData.cpu_usage, uptime: healthData.uptime }, lastChecked: new Date() }; } catch (error) { return { status: 'error', message: error.message, lastChecked: new Date() }; } } ``` ### Alerting and Notifications ```typescript async function monitorDeployments(): Promise<void> { // Get all active deployments const activeDeployments = await DeploymentStorage.findActive(); for (const deployment of activeDeployments) { const health = await checkDeploymentHealth(deployment.id); // Update stored health status await DeploymentStorage.updateHealth(deployment.id, health); // Send alerts for unhealthy deployments if (health.status !== 'healthy') { await sendAlert(deployment.userId, { type: 'deployment_unhealthy', deploymentId: deployment.id, message: health.message, timestamp: new Date(), severity: health.status === 'error' ? 'high' : 'medium' }); } } } ``` ## Cost Optimization Strategies ### Auto-Shutdown for Idle Instances ```typescript async function checkAndShutdownIdleDeployments(): Promise<void> { const runningDeployments = await DeploymentStorage.findByStatus('running'); for (const deployment of runningDeployments) { const user = await UserStorage.get(deployment.userId); const idleMinutes = calculateIdleTime(deployment.lastActiveAt); if (idleMinutes >= user.preferences.autoShutdownMinutes) { console.log(`Shutting down idle deployment ${deployment.id} after ${idleMinutes} minutes`); try { // Get provider-specific client const provider = getProviderClient(deployment.provider); // Stop the deployment await provider.stopDeployment(deployment.id); // Update deployment status await DeploymentStorage.updateStatus(deployment.id, 'stopped'); // Notify user await sendNotification(deployment.userId, { type: 'deployment_auto_shutdown', deploymentId: deployment.id, message: `Your ${deployment.provider} deployment was automatically stopped after ${idleMinutes} minutes of inactivity`, timestamp: new Date() }); } catch (error) { console.error(`Failed to shutdown deployment ${deployment.id}:`, error); } } } } ``` ### Right-Sizing Recommendations ```typescript function generateSizingRecommendation(deployment: Deployment, usageMetrics: UsageMetrics): SizingRecommendation { // Analyze CPU utilization const avgCpuUtilization = calculateAverageCpuUtilization(usageMetrics); const peakCpuUtilization = calculatePeakCpuUtilization(usageMetrics); // Analyze memory utilization const avgMemoryUtilization = calculateAverageMemoryUtilization(usageMetrics); const peakMemoryUtilization = calculatePeakMemoryUtilization(usageMetrics); // Determine if over-provisioned const isOverProvisioned = avgCpuUtilization < 20 && peakCpuUtilization < 50 && avgMemoryUtilization < 40 && peakMemoryUtilization < 70; // Determine if under-provisioned const isUnderProvisioned = avgCpuUtilization > 70 || peakCpuUtilization > 90 || avgMemoryUtilization > 80 || peakMemoryUtilization > 95; // Make recommendation if (isOverProvisioned) { return { currentSize: deployment.instanceType, recommendedSize: getNextSmallerSize(deployment.provider, deployment.instanceType), reason: 'Instance is significantly over-provisioned. Downsizing could reduce costs by approximately 30-50%.', potentialSavings: calculatePotentialSavings(deployment, -1) }; } else if (isUnderProvisioned) { return { currentSize: deployment.instanceType, recommendedSize: getNextLargerSize(deployment.provider, deployment.instanceType), reason: 'Instance is under-provisioned, which could lead to performance issues. Upgrading would improve reliability.', potentialSavings: calculatePotentialSavings(deployment, 1) }; } return { currentSize: deployment.instanceType, recommendedSize: deployment.instanceType, reason: 'Current instance size appears to be appropriate for your usage patterns.', potentialSavings: 0 }; } ``` ## Deployment Strategies ### Continuous Integration/Continuous Deployment Set up a CI/CD pipeline for the MCP Server: ```yaml # GitHub Actions workflow example name: Deploy MCP Server on: push: branches: [main] workflow_dispatch: jobs: test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Setup Node.js uses: actions/setup-node@v3 with: node-version: '16' - name: Install dependencies run: npm ci - name: Run tests run: npm test deploy: needs: test runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Setup Node.js uses: actions/setup-node@v3 with: node-version: '16' - name: Install dependencies run: npm ci - name: Build run: npm run build - name: Deploy to Cloudflare uses: cloudflare/wrangler-action@v2 with: apiToken: ${{ secrets.CLOUDFLARE_API_TOKEN }} ``` ### Blue-Green Deployments For zero-downtime updates to the Crawl4AI Docker server: ```bash #!/bin/bash # Blue-Green deployment script for Crawl4AI Docker server # Pull the new image docker pull unclecode/crawl4ai:latest # Start the new container (green) docker run -d --name crawl4ai-green -p 11236:11235 --shm-size=1g unclecode/crawl4ai:latest # Wait for the new container to initialize echo "Waiting for green deployment to initialize..." sleep 30 # Check if the new container is healthy HEALTH_CHECK=$(curl -s http://localhost:11236/health) if [ $? -ne 0 ] || [[ "$HEALTH_CHECK" != *"status":"healthy"* ]]; then echo "Health check failed for green deployment, rolling back..." docker stop crawl4ai-green docker rm crawl4ai-green exit 1 fi # Update the load balancer/proxy to route to the new container # This depends on your specific infrastructure # Stop the old container (blue) docker stop crawl4ai docker rm crawl4ai # Rename the green container to be the new blue docker rename crawl4ai-green crawl4ai # Update the port mapping if needed # This might require stopping and restarting with the correct port echo "Blue-Green deployment completed successfully" ``` ## Security Best Practices ### JWT Token Configuration ```typescript // JWT configuration for secure tokens const jwtConfig = { algorithm: 'RS256', expiresIn: '1h', audience: 'crawl4ai-mcp-server', issuer: 'crawl4ai-auth' }; // Generate JWT token function generateToken(userId: string, permissions: string[]): string { return jwt.sign( { sub: userId, permissions, iat: Math.floor(Date.now() / 1000) }, PRIVATE_KEY, jwtConfig ); } // Verify JWT token function verifyToken(token: string): JWTPayload { try { return jwt.verify(token, PUBLIC_KEY, jwtConfig); } catch (error) { throw new AuthError('Invalid or expired token'); } } ``` ### API Rate Limiting ```typescript // Rate limiting middleware function rateLimitMiddleware(options: RateLimitOptions) { const limiter = new RateLimiter({ windowMs: options.windowMs || 15 * 60 * 1000, // 15 minutes by default max: options.max || 100, // 100 requests per windowMs by default standardHeaders: true, legacyHeaders: false, keyGenerator: (req) => { // Use user ID if authenticated, IP otherwise return req.user?.id || req.ip; } }); return limiter; } // Apply rate limiting to sensitive routes app.use('/api/deployments', rateLimitMiddleware({ windowMs: 5 * 60 * 1000, max: 20 })); app.use('/api/providers/configure', rateLimitMiddleware({ windowMs: 60 * 60 * 1000, max: 10 })); ``` ### Firewall Configuration Example AWS security group configuration to secure the Crawl4AI Docker server: ```typescript const securityGroup = new aws.ec2.SecurityGroup('crawl4ai-sg', { description: 'Security group for Crawl4AI Docker server', vpcId: vpc.id, ingress: [ { description: 'Crawl4AI API access', fromPort: 11235, toPort: 11235, protocol: 'tcp', // Only allow access from the Cloudflare Worker IPs cidrBlocks: CLOUDFLARE_IP_RANGES, }, { description: 'SSH access', fromPort: 22, toPort: 22, protocol: 'tcp', // Only allow SSH from admin IPs cidrBlocks: ADMIN_IP_ADDRESSES, } ], egress: [ { fromPort: 0, toPort: 0, protocol: '-1', cidrBlocks: ['0.0.0.0/0'], } ], tags: { Name: 'crawl4ai-security-group', }, }); ``` ## Conclusion This implementation guide provides detailed technical considerations and code examples for building the enhanced Crawl4AI MCP Server architecture. By following these patterns and best practices, developers can create a robust, secure, and user-friendly solution that properly integrates with Crawl4AI while supporting multiple users and cloud providers. The focus throughout the implementation should be on: 1. Security first - protecting user data and credentials 2. Performance optimization - ensuring efficient API calls and resource usage 3. User experience - making deployment and management straightforward 4. Cost effectiveness - helping users minimize infrastructure costs 5. Reliability - implementing proper monitoring and recovery procedures By addressing these aspects from the start, the resulting system will provide a strong foundation that can be extended with additional features over time.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/BjornMelin/crawl4ai-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

IMPLEMENTATION_GUIDE.md•20.4 KiB