Crawl4AI+SearXNG MCP Server

README.md•7.19 KiB

# Integration Test Suite Comprehensive integration tests for Crawl4AI MCP using real services and Docker containers. ## Overview This integration test suite validates the complete system behavior using actual services: - **End-to-End Workflows**: Complete RAG pipeline testing (25 tests) - **Database Integration**: Real Qdrant and Supabase testing (11 tests) - **Performance Benchmarks**: Throughput, latency, and scalability testing (8 tests) **Total**: 25 integration tests with comprehensive coverage of system workflows. ## Test Structure ``` tests/integration/ ├── conftest.py # Docker-based fixtures and test configuration ├── test_e2e_workflows_simple.py # End-to-end workflow tests (6 tests) ├── test_database_integration.py # Database integration tests (11 tests) ├── test_performance_benchmarks.py # Performance benchmarks (8 tests) └── README.md # This file ``` ## Prerequisites ### Required Services The integration tests require these services running via Docker: ```bash # Start all required services make dev-bg-nobuild # Or manually with docker-compose docker compose up -d qdrant searxng valkey ``` ### Required Services - **Qdrant**: Vector database (port 6333) - **SearXNG**: Search engine (port 8080) - **Valkey**: Cache (port 6379) ### Optional Services - **Supabase**: Alternative vector database (requires credentials) - **Neo4j**: Knowledge graph (port 7474) ## Running Integration Tests ### Quick Start ```bash # Start services first make dev # Run all integration tests pytest tests/integration/ -v # Run specific test categories pytest tests/integration/ -m e2e pytest tests/integration/ -m performance pytest tests/integration/ -m integration ``` ### Test Categories #### End-to-End Workflows (`-m e2e`) - Basic crawl and search workflow - RAG query functionality - Code search workflow - Smart crawling - Error handling - Performance validation #### Database Integration (`-m integration`) - Qdrant connection lifecycle - Document storage and retrieval - Concurrent operations - Large document handling - Error recovery - Data consistency - Factory pattern testing #### Performance Benchmarks (`-m performance`) - Batch crawling throughput - Search operation throughput - Storage operation throughput - Single operation latencies - Cold start performance - Database size scaling - Concurrent user simulation - Memory usage patterns ### Advanced Usage ```bash # Skip slow tests (recommended for development) pytest tests/integration/ -m "integration and not slow" # Run with detailed output pytest tests/integration/ -v -s # Run specific test file pytest tests/integration/test_e2e_workflows_simple.py -v # Run performance tests only pytest tests/integration/test_performance_benchmarks.py -v ``` ## Configuration ### Environment Variables Integration tests use these environment variables (set automatically in test environment): ```bash # Database Configuration DATABASE_TYPE=qdrant QDRANT_URL=http://localhost:6333 QDRANT_API_KEY="" # Service URLs SEARXNG_URL=http://localhost:8080 CACHE_REDIS_URL=redis://localhost:6379/1 # Feature Flags (simplified for testing) ENHANCED_CONTEXT=false USE_RERANKING=false USE_AGENTIC_RAG=false USE_HYBRID_SEARCH=false ``` ### Supabase Configuration (Optional) For Supabase integration tests, set these environment variables: ```bash export SUPABASE_URL="your-supabase-url" export SUPABASE_SERVICE_ROLE_KEY="your-service-role-key" ``` Without these, Supabase tests will be skipped. ## Performance Targets The integration tests validate these performance characteristics: ### Latency Targets - **Single crawl operation**: < 5 seconds - **Search operation**: < 2 seconds - **Document storage**: < 1 second - **E2E workflow**: < 20 seconds ### Throughput Targets - **Crawling throughput**: > 2 URLs/second - **Search throughput**: > 5 searches/second - **Storage throughput**: > 5 documents/second ### Resource Limits - **Memory usage**: < 200MB peak for test workloads - **Memory per document**: < 500KB average - **Cold start time**: < 5 seconds for service initialization ## Test Data Management ### Automatic Cleanup - Tests use isolated test collections/tables - Automatic cleanup after each test - No interference between test runs ### Test Data Patterns - Small documents: ~500 characters - Large documents: ~5KB+ characters - Mixed workloads: Various document sizes - Realistic content patterns ## Debugging Integration Tests ### Common Issues 1. **Services Not Running** ```bash # Error: Connection refused # Solution: Start Docker services make dev ``` 2. **Port Conflicts** ```bash # Error: Port already in use # Solution: Stop conflicting services docker compose down ``` 3. **Slow Performance** ```bash # Use performance test markers to identify bottlenecks pytest tests/integration/ -m performance -v ``` ### Debugging Commands ```bash # Check service health docker compose ps docker compose logs qdrant docker compose logs searxng # Test specific service connections curl http://localhost:6333/collections curl http://localhost:8080/stats # Run single test with full output pytest tests/integration/test_e2e_workflows_simple.py::TestEndToEndWorkflows::test_basic_crawl_and_search_workflow -v -s ``` ### Log Analysis Integration tests include detailed logging: ```bash # Run with debug logging LOG_LEVEL=DEBUG pytest tests/integration/ -v -s # Check performance metrics pytest tests/integration/ -m performance --tb=short ``` ## Coverage Impact Integration tests significantly boost test coverage by validating: - **Real database operations**: 15-20% coverage increase - **Complete workflows**: 10-15% coverage increase - **Error handling paths**: 5-10% coverage increase - **Performance critical code**: 5-10% coverage increase **Estimated total coverage contribution**: 35-55% additional coverage beyond unit tests. ## Contributing ### Adding New Integration Tests 1. **End-to-End Tests**: Add to `test_e2e_workflows_simple.py` 2. **Database Tests**: Add to `test_database_integration.py` 3. **Performance Tests**: Add to `test_performance_benchmarks.py` ### Test Guidelines - Use `@pytest.mark.integration` for all integration tests - Use `@pytest.mark.e2e` for workflow tests - Use `@pytest.mark.performance` for performance tests - Use `@pytest.mark.slow` for tests that take >10 seconds - Mock external services, use real local services - Include performance assertions where appropriate - Clean up test data automatically ### Fixture Guidelines - Use session-scoped fixtures for Docker services - Use function-scoped fixtures for database clients - Include proper cleanup in fixtures - Handle service unavailability gracefully ## Maintenance ### Regular Tasks 1. **Update performance targets** as system improves 2. **Review slow tests** and optimize where possible 3. **Monitor resource usage** and adjust limits 4. **Update Docker service versions** in compose file ### Monitoring Integration tests include built-in monitoring for: - Test execution time - Memory usage patterns - Service response times - Error rates and types Use this data to: - Identify performance regressions - Optimize slow operations - Plan capacity requirements - Improve error handling

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/AI-enthusiasts/crawl4ai-rag-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

README.md•7.19 KiB