Crawl4AI MCP Server

CONTRIBUTING.md•34.4 KiB

# CONTRIBUTING.md ## Welcome Contributors! 👋 We're excited that you're interested in contributing to the Crawl4AI MCP Server project. This guide provides comprehensive instructions for contributing to our Docker-based MCP server implementation. ## Table of Contents 1. [Code of Conduct](#code-of-conduct) 2. [Getting Started](#getting-started) 3. [Docker Development Workflow](#docker-development-workflow) 4. [Development Environment](#development-environment) 5. [Code Style Guidelines](#code-style-guidelines) 6. [Testing Guidelines](#testing-guidelines) 7. [Pull Request Process](#pull-request-process) 8. [Release Process](#release-process) 9. [Getting Help](#getting-help) ## Code of Conduct ### Our Standards - **Be respectful and inclusive** to all contributors - **Welcome newcomers** and help them understand our MCP server architecture - **Accept constructive criticism** gracefully and provide helpful feedback - **Focus on what's best** for the MCP community and web crawling ecosystem - **Show empathy** towards other contributors learning Docker and MCP concepts ### Unacceptable Behavior - Harassment, discriminatory language, or personal attacks - Publishing others' private information (API keys, credentials) - Trolling or deliberately disruptive behavior - Other conduct deemed inappropriate for a professional development environment ## Getting Started ### Prerequisites #### Required Tools ```bash # Core Requirements - Python 3.11+ (with pip and venv support) - Docker Desktop or Docker Engine (latest version with BuildKit) - Git (for version control and CI/CD) - GNU Make (for automation scripts) # Development Platform - Linux/macOS/Windows (with WSL2 for Windows) - AMD64/X86_64 architecture (ARM64 support experimental) ``` #### Recommended Development Tools ```bash # Code Editors - VS Code with Python and Docker extensions - PyCharm Professional or Community - Any editor with Python syntax highlighting # Terminal Tools - curl or httpie (for API testing) - jq (for JSON processing) - Docker Compose v2.x ``` #### Optional Tools for Advanced Development ```bash # GitHub Integration - GitHub CLI (gh) for workflow management - GitHub Desktop (optional GUI) # Testing Tools - Playwright browsers (auto-installed in containers) - Postman or Insomnia (for HTTP transport testing) # Monitoring Tools - Docker Desktop Dashboard - Portainer (for container management) ``` ### Environment Setup #### 1. Fork and Clone Repository ```bash # 1. Fork the repository on GitHub # 2. Clone your fork git clone https://github.com/YOUR_USERNAME/crawl-mcp.git cd crawl-mcp # 3. Add upstream remote git remote add upstream https://github.com/sruckh/crawl-mcp.git # 4. Verify remotes git remote -v ``` #### 2. Initial Environment Setup ```bash # Run automated setup (Linux/macOS) ./setup.sh # OR run setup manually python -m venv venv source venv/bin/activate # Linux/macOS # venv\Scripts\activate # Windows # Install dependencies pip install -r requirements.txt pip install -r requirements-dev.txt # Development dependencies ``` #### 3. Docker Environment Setup ```bash # Create shared Docker network docker network create shared_net # Copy environment configuration cp .env.example .env # Edit .env with your specific settings # Verify Docker setup docker --version docker-compose --version ``` ## Docker Development Workflow ### Container Development Strategy Our project uses a multi-container strategy optimized for different development and deployment scenarios. #### Development Containers Overview | Container | Dockerfile | Use Case | Build Time | Image Size | |-----------|------------|----------|------------|------------| | **Standard** | `Dockerfile` | Full features + potential CUDA | 8-12 min | ~2-3GB | | **CPU-Optimized** | `Dockerfile.cpu` | Local development (recommended) | 6-9 min | ~1.5-2GB | | **Lightweight** | `Dockerfile.lightweight` | Resource-constrained environments | 4-6 min | ~1-1.5GB | | **RunPod** | `Dockerfile.runpod` | Serverless cloud deployment | 6-9 min | ~1.5-2GB | ### Local Development Workflow #### 1. Start Development Environment ```bash # Option A: CPU-optimized (recommended for most development) docker-compose -f docker-compose.cpu.yml build docker-compose -f docker-compose.cpu.yml up -d # Option B: Standard build (if you need full features) docker-compose build docker-compose up -d # Option C: Lightweight (for testing minimal configurations) docker build -f Dockerfile.lightweight -t crawl4ai-mcp:dev . docker run -d --name crawl4ai-dev --network shared_net crawl4ai-mcp:dev ``` #### 2. Development Testing ```bash # Health check docker-compose exec crawl4ai-mcp-server python -c " from crawl4ai_mcp.server import mcp print('✅ MCP server healthy') " # Test basic crawling functionality docker-compose exec crawl4ai-mcp-server python -c " from crawl4ai import AsyncWebCrawler print('✅ Crawl4AI engine available') " # Test MCP tools docker-compose exec crawl4ai-mcp-server python -m crawl4ai_mcp.server --help ``` #### 3. Live Development with Volume Mounting ```bash # Create development override cat > docker-compose.dev.yml << EOF version: '3.8' services: crawl4ai-mcp-server: volumes: - ./crawl4ai_mcp:/app/crawl4ai_mcp:ro - ./examples:/app/examples:ro environment: - FASTMCP_LOG_LEVEL=DEBUG - PYTHONPATH=/app EOF # Start with development overrides docker-compose -f docker-compose.cpu.yml -f docker-compose.dev.yml up -d ``` ### Branch Development Workflow #### Branch Naming Conventions ```bash # Feature development feature/mcp-tool-enhancement feature/youtube-transcript-improvement feature/docker-optimization # Bug fixes fix/playwright-timeout-issue fix/memory-leak-in-cache fix/container-startup-failure # Infrastructure changes infra/github-actions-optimization infra/runpod-deployment-update infra/container-security-hardening # Documentation updates docs/api-reference-update docs/docker-setup-guide docs/troubleshooting-improvements ``` #### Development Branch Workflow ```bash # 1. Update your fork git fetch upstream git checkout main git merge upstream/main git push origin main # 2. Create feature branch git checkout -b feature/your-amazing-feature # 3. Set up development environment for your branch docker-compose -f docker-compose.cpu.yml down # Stop existing containers docker-compose -f docker-compose.cpu.yml build --no-cache # Clean build docker-compose -f docker-compose.cpu.yml up -d # 4. Make your changes and test # Edit files, test in container... # 5. Rebuild and test after changes docker-compose -f docker-compose.cpu.yml build docker-compose -f docker-compose.cpu.yml restart ``` ### Container Testing and Validation #### 1. Build Testing ```bash # Test all container variants ./scripts/build-cpu-containers.sh # Manual build testing docker build -f Dockerfile.cpu -t crawl4ai-test:cpu . docker build -f Dockerfile.lightweight -t crawl4ai-test:light . docker build -f Dockerfile.runpod -t crawl4ai-test:runpod . ``` #### 2. Functionality Testing ```bash # Test MCP server startup docker run --rm crawl4ai-test:cpu python -c " from crawl4ai_mcp.server import mcp print('MCP server can initialize') " # Test tool availability docker run --rm --network shared_net crawl4ai-test:cpu python -c " import asyncio from crawl4ai_mcp.server import * print('All MCP tools available') " ``` #### 3. Integration Testing ```bash # Test with Claude Desktop configuration # Copy appropriate config from configs/ directory cp configs/claude_desktop_config.json ~/.config/Claude/claude_desktop_config.json # Test HTTP transport mode docker run -d --name test-http --network shared_net -e MCP_TRANSPORT=http crawl4ai-test:cpu curl http://test-http:8000/health # Cleanup test containers docker rm -f test-http ``` ## Development Environment ### Configuration Management #### Environment Variables for Development ```bash # Copy and customize environment file cp .env.example .env.dev # Key development variables FASTMCP_LOG_LEVEL=DEBUG # Detailed logging MCP_TRANSPORT=stdio # Default for Claude Desktop MAX_CONCURRENT_REQUESTS=3 # Conservative for development CACHE_TTL=300 # Short cache for testing SAFE_MODE=false # Allow broader testing ``` #### Development vs Production Configuration ```bash # Development configuration (relaxed limits) MAX_FILE_SIZE_MB=50 # Smaller for faster testing PLAYWRIGHT_TIMEOUT=30 # Shorter for quicker feedback ENABLE_FILE_PROCESSING=true # Test all features CRAWL4AI_DISABLE_SENTENCE_TRANSFORMERS=true # CPU-only for speed # Production configuration (secure and optimized) MAX_FILE_SIZE_MB=100 # Full size support PLAYWRIGHT_TIMEOUT=60 # More patience for complex sites SAFE_MODE=true # Enhanced security RATE_LIMIT_ENABLED=true # Protect against abuse ``` ### IDE Setup and Configuration #### VS Code Configuration Create `.vscode/settings.json`: ```json { "python.defaultInterpreterPath": "./venv/bin/python", "python.linting.enabled": true, "python.linting.pylintEnabled": true, "python.formatting.provider": "black", "python.testing.pytestEnabled": true, "docker.defaultPlatform": "linux/amd64", "files.associations": { "Dockerfile*": "dockerfile", "docker-compose*.yml": "docker-compose" } } ``` #### VS Code Extensions ```json { "recommendations": [ "ms-python.python", "ms-azuretools.vscode-docker", "ms-vscode.makefile-tools", "yzhang.markdown-all-in-one", "redhat.vscode-yaml" ] } ``` ## Code Style Guidelines ### Python Code Style We follow PEP 8 with specific adaptations for MCP server development. #### Import Organization ```python # Standard library imports import asyncio import json import os from typing import Dict, List, Optional from pathlib import Path # Third-party imports from pydantic import BaseModel, Field import aiohttp from fastmcp import FastMCP # Local application imports from .config import ConfigManager from .suppress_output import suppress_stdout_stderr from .strategies import ExtractionStrategy ``` #### MCP Tool Development Pattern ```python @mcp.tool(description="Clear, descriptive tool description") async def your_mcp_tool( param1: str = Field(description="Clear parameter description"), param2: Optional[int] = Field(default=None, description="Optional parameter"), ) -> str: """ Detailed docstring explaining tool functionality. Args: param1: Detailed parameter explanation param2: Optional parameter explanation Returns: JSON string with tool results Raises: Exception: When operation fails """ try: # Always use output suppression for crawl4ai operations with suppress_stdout_stderr(): # Your tool implementation here result = await some_async_operation(param1, param2) # Return JSON-encoded results return json.dumps({ "success": True, "data": result, "metadata": { "timestamp": "...", "tool": "your_mcp_tool" } }, ensure_ascii=False) except Exception as e: # Always return JSON-encoded errors return json.dumps({ "success": False, "error": str(e), "error_type": type(e).__name__ }, ensure_ascii=False) ``` #### Error Handling Patterns ```python # MCP-specific error handling async def robust_mcp_operation(): """Example of robust error handling for MCP tools.""" try: # Primary operation with suppress_stdout_stderr(): result = await primary_operation() return success_response(result) except TimeoutError as e: # Handle timeout specifically return error_response(f"Operation timed out: {e}") except aiohttp.ClientError as e: # Handle network errors return error_response(f"Network error: {e}") except Exception as e: # Generic fallback return error_response(f"Unexpected error: {e}") def success_response(data): """Standardized success response format.""" return json.dumps({ "success": True, "data": data, "timestamp": asyncio.get_event_loop().time() }, ensure_ascii=False) def error_response(message): """Standardized error response format.""" return json.dumps({ "success": False, "error": message, "timestamp": asyncio.get_event_loop().time() }, ensure_ascii=False) ``` ### Docker and Configuration Style #### Dockerfile Best Practices ```dockerfile # Use specific Python version FROM python:3.11-slim # Set working directory early WORKDIR /app # Copy requirements first (for better caching) COPY requirements*.txt ./ # Install dependencies in single layer RUN pip install --no-cache-dir -r requirements.txt \ && pip install --no-deps crawl4ai>=0.3.0 # Copy application code COPY . . # Use non-root user RUN useradd -m -u 1000 appuser USER appuser # Health check HEALTHCHECK --interval=30s --timeout=10s --retries=3 \ CMD python -c "from crawl4ai_mcp.server import mcp; print('OK')" ``` #### Environment Variable Naming ```bash # Use consistent naming patterns CRAWL4AI_* # crawl4ai-specific settings MCP_* # MCP protocol settings FASTMCP_* # FastMCP framework settings PLAYWRIGHT_* # Browser automation settings MAX_* # Resource limits ENABLE_* # Feature toggles ``` ### File Organization and Naming #### Project Structure Conventions ``` crawl4ai_mcp/ ├── __init__.py # Tool selection guide ├── server.py # Main MCP server (keep monolithic for now) ├── config.py # Configuration management ├── strategies.py # Extraction strategies ├── file_processor.py # File processing utilities ├── youtube_processor.py # YouTube-specific processor ├── google_search_processor.py # Google search integration └── suppress_output.py # Output suppression utility docker/ # Docker-related files ├── Dockerfile # Standard container ├── Dockerfile.cpu # CPU-optimized ├── Dockerfile.lightweight # Minimal container ├── Dockerfile.runpod # Serverless optimized ├── docker-compose.yml # Standard deployment ├── docker-compose.cpu.yml # CPU-optimized deployment └── .dockerignore # Build optimization tests/ # Test organization ├── unit/ # Unit tests ├── integration/ # Integration tests ├── containers/ # Container-specific tests └── mcp/ # MCP protocol tests ``` #### Naming Conventions ```python # Files and modules file_processor.py # Snake case for modules youtube_processor.py # Descriptive, specific names google_search_processor.py # Full name clarity # Classes class YouTubeProcessor: # PascalCase for classes class ConfigManager: # Clear, descriptive names class ExtractionStrategy: # Interface naming # Functions and methods async def extract_transcript(): # Snake case, async prefix def get_supported_formats(): # Verb + noun pattern async def search_and_crawl(): # Action-oriented names # Constants MAX_FILE_SIZE_MB = 100 # UPPER_SNAKE_CASE DEFAULT_CACHE_TTL = 900 # Clear, specific names SUPPORTED_VIDEO_PLATFORMS = [...] # Descriptive constant names ``` ## Testing Guidelines ### Testing Strategy Our testing approach covers multiple layers: unit tests, integration tests, container tests, and MCP protocol tests. #### Unit Testing ```python # test_file_processor.py import pytest from unittest.mock import AsyncMock, patch from crawl4ai_mcp.file_processor import FileProcessor class TestFileProcessor: @pytest.fixture def file_processor(self): return FileProcessor() @pytest.mark.asyncio async def test_process_pdf_file(self, file_processor): # Arrange test_url = "https://example.com/test.pdf" # Act result = await file_processor.process_file_url(test_url) # Assert assert result["success"] is True assert "content" in result["data"] assert result["data"]["file_type"] == "pdf" @pytest.mark.asyncio async def test_process_unsupported_file(self, file_processor): # Arrange test_url = "https://example.com/test.xyz" # Act & Assert with pytest.raises(ValueError, match="Unsupported file type"): await file_processor.process_file_url(test_url) ``` #### Integration Testing ```python # test_mcp_integration.py import pytest import asyncio from crawl4ai_mcp.server import mcp class TestMCPIntegration: @pytest.mark.asyncio async def test_crawl_url_tool(self): # Test the actual MCP tool result = await crawl_url("https://httpbin.org/html") # Parse JSON response response = json.loads(result) assert response["success"] is True assert "content" in response["data"] @pytest.mark.asyncio async def test_file_processing_integration(self): # Test file processing end-to-end result = await process_file("https://example.com/test.pdf") response = json.loads(result) assert response["success"] is True assert response["data"]["file_type"] == "pdf" ``` #### Container Testing ```bash # tests/test_containers.py #!/bin/bash # Test CPU-optimized container test_cpu_container() { echo "Testing CPU-optimized container..." # Build container docker build -f Dockerfile.cpu -t test-cpu . # Test basic functionality docker run --rm test-cpu python -c " from crawl4ai_mcp.server import mcp print('✅ CPU container: MCP server loads') " # Test crawl4ai availability docker run --rm test-cpu python -c " from crawl4ai import AsyncWebCrawler print('✅ CPU container: Crawl4AI available') " # Cleanup docker image rm test-cpu } # Test lightweight container test_lightweight_container() { echo "Testing lightweight container..." docker build -f Dockerfile.lightweight -t test-light . # Test resource constraints docker run --rm --memory=512m test-light python -c " import psutil memory_mb = psutil.virtual_memory().total / 1024 / 1024 assert memory_mb <= 600, f'Memory too high: {memory_mb}MB' print('✅ Lightweight container: Memory constraints OK') " docker image rm test-light } # Run tests test_cpu_container test_lightweight_container echo "✅ All container tests passed" ``` #### MCP Protocol Testing ```python # test_mcp_protocol.py import pytest import json from mcp import ClientSession, StdioServerParameters from mcp.client.stdio import stdio_client class TestMCPProtocol: @pytest.mark.asyncio async def test_mcp_server_tools(self): """Test MCP server exposes expected tools.""" # Start MCP server in test mode server_params = StdioServerParameters( command="python", args=["-m", "crawl4ai_mcp.server"] ) async with stdio_client(server_params) as (read, write): async with ClientSession(read, write) as session: # Initialize MCP session await session.initialize() # List available tools tools = await session.list_tools() # Verify expected tools are available tool_names = [tool.name for tool in tools.tools] expected_tools = [ "crawl_url", "deep_crawl_site", "intelligent_extract", "extract_entities", "process_file", "extract_youtube_transcript", "search_google" ] for tool in expected_tools: assert tool in tool_names, f"Missing tool: {tool}" @pytest.mark.asyncio async def test_crawl_url_mcp_call(self): """Test actual MCP tool call.""" server_params = StdioServerParameters( command="python", args=["-m", "crawl4ai_mcp.server"] ) async with stdio_client(server_params) as (read, write): async with ClientSession(read, write) as session: await session.initialize() # Call crawl_url tool result = await session.call_tool( "crawl_url", {"url": "https://httpbin.org/html"} ) # Verify response structure assert result.content is not None response = json.loads(result.content[0].text) assert response["success"] is True assert "data" in response ``` ### Test Execution #### Running Tests Locally ```bash # Install test dependencies pip install -r requirements-dev.txt # Run unit tests pytest tests/unit/ -v # Run integration tests (requires containers) pytest tests/integration/ -v # Run container tests pytest tests/containers/ -v # Run all tests with coverage pytest --cov=crawl4ai_mcp --cov-report=html # Run specific test categories pytest -m "unit" -v # Only unit tests pytest -m "integration" -v # Only integration tests pytest -m "slow" -v # Only slow/heavy tests ``` #### Docker Test Environment ```bash # Test in clean container environment docker run --rm -v $(pwd):/app -w /app python:3.11-slim bash -c " pip install -r requirements.txt -r requirements-dev.txt pytest tests/ -v " # Test specific container builds ./scripts/test-all-containers.sh # Test MCP protocol compliance docker-compose -f docker-compose.test.yml up --abort-on-container-exit ``` ### Test Quality Standards - **Coverage**: Aim for 80%+ code coverage - **Speed**: Unit tests <1s each, integration tests <30s each - **Isolation**: Each test should be independent and repeatable - **Documentation**: Test names should clearly describe what they test - **Realism**: Integration tests should use realistic data and scenarios ## Pull Request Process ### Before Submitting a PR #### Pre-submission Checklist - [ ] **Code Quality** - [ ] Code follows Python PEP 8 and project style guidelines - [ ] All functions have appropriate docstrings - [ ] Error handling follows MCP pattern (JSON responses) - [ ] No hardcoded credentials or sensitive data - [ ] **Testing** - [ ] Unit tests added for new functionality - [ ] Integration tests pass - [ ] Container builds successfully - [ ] MCP protocol compliance verified - [ ] **Docker & Infrastructure** - [ ] All container variants build successfully - [ ] Docker Compose configurations updated if needed - [ ] Environment variables documented in CONFIG.md - [ ] No breaking changes to existing container interfaces - [ ] **Documentation** - [ ] README.md updated if needed - [ ] API documentation updated for new tools - [ ] CHANGELOG.md updated with changes - [ ] Configuration examples provided - [ ] **Security** - [ ] No API keys or secrets committed - [ ] Input validation implemented - [ ] Output sanitization in place - [ ] Container security best practices followed #### Local Testing Before PR ```bash # 1. Run full test suite pytest tests/ -v --cov=crawl4ai_mcp # 2. Build and test all container variants ./scripts/build-cpu-containers.sh # 3. Test MCP protocol compliance python -m crawl4ai_mcp.server --test-mode # 4. Lint and format code black crawl4ai_mcp/ flake8 crawl4ai_mcp/ mypy crawl4ai_mcp/ # 5. Test with Claude Desktop (if available) # Update claude_desktop_config.json and test tools # 6. Security scan bandit -r crawl4ai_mcp/ ``` ### PR Template When creating a pull request, use this template: ```markdown ## Description Brief description of changes and motivation. ## Type of Change - [ ] 🐛 Bug fix (non-breaking change that fixes an issue) - [ ] ✨ New feature (non-breaking change that adds functionality) - [ ] 💥 Breaking change (fix or feature that causes existing functionality to change) - [ ] 📚 Documentation update - [ ] 🏗️ Infrastructure change (Docker, CI/CD, deployment) - [ ] 🧪 Test improvements - [ ] 🔧 Maintenance/refactoring ## MCP Tools Affected - [ ] `crawl_url` - Web page crawling - [ ] `deep_crawl_site` - Multi-page crawling - [ ] `intelligent_extract` - AI-powered extraction - [ ] `extract_entities` - Named entity recognition - [ ] `process_file` - File processing (PDF, Office, etc.) - [ ] `extract_youtube_transcript` - YouTube transcripts - [ ] `search_google` - Google search integration - [ ] New tool: `tool_name` - Description ## Container Impact - [ ] Standard container (`Dockerfile`) - [ ] CPU-optimized container (`Dockerfile.cpu`) - [ ] Lightweight container (`Dockerfile.lightweight`) - [ ] RunPod serverless container (`Dockerfile.runpod`) - [ ] Docker Compose configurations - [ ] No container changes ## Testing Completed - [ ] Unit tests pass (`pytest tests/unit/`) - [ ] Integration tests pass (`pytest tests/integration/`) - [ ] Container builds successful (`./scripts/build-cpu-containers.sh`) - [ ] MCP protocol compliance verified - [ ] Manual testing with Claude Desktop completed - [ ] Performance impact assessed ## Documentation Updates - [ ] Code docstrings updated - [ ] API documentation updated - [ ] README.md updated - [ ] CONFIG.md updated with new environment variables - [ ] CHANGELOG.md updated - [ ] No documentation changes needed ## Security Considerations - [ ] No sensitive data exposed - [ ] Input validation implemented - [ ] Output sanitization in place - [ ] Container security best practices followed - [ ] No new security vulnerabilities introduced ## Breaking Changes If this PR introduces breaking changes, describe: - What breaks - Migration path for users - Version bump required ## Performance Impact - [ ] Performance improved - [ ] No performance impact - [ ] Performance regression (justified because...) ## Related Issues Fixes #(issue number) Closes #(issue number) Relates to #(issue number) ## Additional Context Add any other context, screenshots, or notes about the PR here. ``` ### Review Process #### What Reviewers Look For 1. **Code Quality**: Clean, readable, well-documented code 2. **MCP Compliance**: Proper tool signatures and JSON responses 3. **Container Compatibility**: Works across all container variants 4. **Security**: No vulnerabilities or data leaks 5. **Performance**: No unexpected resource usage 6. **Documentation**: Clear documentation for new features #### Addressing Review Feedback ```bash # Make requested changes git add . git commit -m "Address review feedback: improve error handling" # Update PR branch git push origin feature/your-feature # If major changes requested, squash commits git rebase -i HEAD~n # where n is number of commits git push --force-with-lease origin feature/your-feature ``` ### Post-Merge Process ```bash # After merge, clean up git checkout main git pull upstream main git branch -d feature/your-feature git remote prune origin # Update CHANGELOG if not done in PR # Tag release if this was a major feature ``` ## Release Process ### Versioning Strategy We use [Semantic Versioning](https://semver.org/) with Docker-specific considerations: - **MAJOR** (1.0.0 → 2.0.0): Breaking changes to MCP API, container interfaces, or deployment requirements - **MINOR** (1.0.0 → 1.1.0): New MCP tools, container optimizations, new features - **PATCH** (1.0.0 → 1.0.1): Bug fixes, security patches, documentation updates ### Release Workflow #### 1. Pre-Release Preparation ```bash # Update version in key files VERSION="1.2.0" # Update version references echo $VERSION > VERSION sed -i "s/version = .*/version = \"$VERSION\"/" pyproject.toml # Update CHANGELOG.md with release notes # Update CLAUDE.md with version reference # Build and test all containers ./scripts/build-cpu-containers.sh ``` #### 2. Container Release Process ```bash # Tag containers for release docker build -f Dockerfile.cpu -t crawl4ai-mcp:$VERSION . docker build -f Dockerfile.lightweight -t crawl4ai-mcp:$VERSION-light . docker build -f Dockerfile.runpod -t crawl4ai-mcp:$VERSION-runpod . # Test release containers docker run --rm crawl4ai-mcp:$VERSION python -c " from crawl4ai_mcp.server import mcp print(f'Release {VERSION} container test passed') " ``` #### 3. GitHub Actions Release Our automated release process triggers on tag creation: ```bash # Create and push release tag git tag v$VERSION git push upstream v$VERSION # This triggers .github/workflows/build-runpod-docker.yml which: # 1. Builds all container variants # 2. Pushes to Docker Hub with version tags # 3. Creates GitHub release with artifacts ``` #### 4. Release Notes Template ```markdown # Release v1.2.0 ## 🚀 New Features - New MCP tool: `extract_structured_data` for JSON schema extraction - CPU-optimized container now 30% smaller - Added support for batch file processing ## 🐛 Bug Fixes - Fixed memory leak in YouTube transcript processor - Resolved container startup issues on ARM64 systems - Fixed timeout handling in deep crawl operations ## 🔧 Infrastructure - Updated to Python 3.11.7 base image - Improved GitHub Actions build performance - Enhanced container security with non-root user ## 📚 Documentation - Added comprehensive Docker deployment guide - Updated API documentation with new examples - Improved troubleshooting section ## 🔄 Breaking Changes None in this release. ## 📦 Container Images - `docker.io/gemneye/crawl4ai-runpod-serverless:v1.2.0` - `docker.io/gemneye/crawl4ai-runpod-serverless:1.2` - `docker.io/gemneye/crawl4ai-runpod-serverless:1` - `docker.io/gemneye/crawl4ai-runpod-serverless:latest` ## 🔧 Migration Guide No migration steps required for this release. ## 📊 Performance Improvements - 25% faster container builds - 15% reduction in memory usage - Improved cache hit rates ``` ### Docker Hub Release Management #### Automated Builds Our GitHub Actions workflow automatically builds and pushes to Docker Hub: - **Repository**: `docker.io/gemneye/crawl4ai-runpod-serverless` - **Tags**: `latest`, `v1.2.0`, `1.2`, `1` - **Platform**: `linux/amd64` (AMD64/X86_64 only) #### Manual Release (if needed) ```bash # Login to Docker Hub docker login # Build and push specific version docker build -f Dockerfile.runpod -t gemneye/crawl4ai-runpod-serverless:v1.2.0 . docker push gemneye/crawl4ai-runpod-serverless:v1.2.0 # Tag and push additional tags docker tag gemneye/crawl4ai-runpod-serverless:v1.2.0 gemneye/crawl4ai-runpod-serverless:latest docker push gemneye/crawl4ai-runpod-serverless:latest ``` ## Getting Help ### Resources and Support #### Primary Documentation - **[Project README](./README.md)**: Overview and quick start - **[Architecture Guide](./ARCHITECTURE.md)**: Technical architecture and design decisions - **[Configuration Guide](./CONFIG.md)**: Environment variables and settings - **[Build Guide](./BUILD.md)**: Build processes and deployment strategies - **[Docker Guide](./DOCKER.md)**: Docker-specific deployment instructions #### Community and Support - **[GitHub Issues](https://github.com/sruckh/crawl-mcp/issues)**: Bug reports and feature requests - **[GitHub Discussions](https://github.com/sruckh/crawl-mcp/discussions)**: Questions and community discussions - **[MCP Community](https://github.com/modelcontextprotocol)**: Model Context Protocol community resources #### Getting Help Effectively #### For Bug Reports ```markdown **Bug Report Template** ## Environment - Container variant: [Standard/CPU/Lightweight/RunPod] - Docker version: [output of `docker --version`] - OS: [Linux/macOS/Windows with WSL2] - Python version: [if running locally] ## Expected Behavior Clear description of what should happen. ## Actual Behavior Clear description of what actually happens. ## Steps to Reproduce 1. Step 1 2. Step 2 3. Step 3 ## Error Messages ``` Paste error messages here ``` ## Additional Context Any other context, screenshots, or files that might help. ``` #### For Feature Requests ```markdown **Feature Request Template** ## Problem Statement Describe the problem this feature would solve. ## Proposed Solution Describe your proposed solution or approach. ## Alternatives Considered What other approaches did you consider? ## Additional Context Any other context, mockups, or examples. ## MCP Integration How should this integrate with existing MCP tools? ## Container Impact Which containers would be affected by this change? ``` #### For Questions - **Search first**: Check existing issues and discussions - **Be specific**: Include relevant context about your setup - **Provide examples**: Show what you're trying to accomplish - **Tag appropriately**: Use labels like `question`, `docker`, `mcp-tool` ### Beginner-Friendly Issues We welcome first-time contributors! Look for issues labeled: - **`good first issue`**: Perfect for newcomers - **`help wanted`**: Community help needed - **`documentation`**: Improve our docs - **`container-optimization`**: Docker/containerization improvements - **`mcp-tool`**: New MCP tool development #### First Contribution Workflow ```bash # 1. Find a good first issue # Browse: https://github.com/sruckh/crawl-mcp/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22 # 2. Comment on the issue to express interest # 3. Fork and clone the repository # 4. Set up development environment (see above) # 5. Create feature branch and implement changes # 6. Test thoroughly (unit + container tests) # 7. Submit pull request using our template # 8. Respond to review feedback ``` ### Recognition and Community #### Contributor Recognition All contributors are recognized in: - **README.md contributors section**: Public recognition - **GitHub contributors graph**: Automatic tracking - **Release notes**: Major contributors highlighted - **Hall of Fame**: Outstanding contributors featured #### Community Values - **Inclusive environment**: Everyone welcome regardless of experience level - **Learning-focused**: Help others learn Docker, MCP, and web crawling - **Quality-oriented**: Maintain high standards while being supportive - **Innovation-driven**: Encourage creative solutions and improvements ## Keywords  contributing, development workflow, Docker containers, MCP server, Python development, testing guidelines, pull requests, code review, release process, community guidelines, containerization, FastMCP, crawl4ai, GitHub Actions, CI/CD, debugging, troubleshooting

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/sruckh/crawl-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

CONTRIBUTING.md•34.4 KiB