Skip to main content
Glama

MCP Server for Crawl4AI

by omgwtfwow
CHANGELOG.md20.4 kB
# Changelog ## Version 3.0.2 (2025-09-01) ### Bug Fixes - Fixed manage_session tool schema compatibility with Claude/Anthropic tools - Removed oneOf/allOf/anyOf from top-level schema - Simplified to plain object schema with enum constraints - Maintains all functionality while improving MCP client compatibility ## Version 3.0.1 (2025-08-30) ### Documentation - Updated README.md to accurately document all new parameters from v3.0.0 - Added documentation for batch_crawl configs array parameter - Clarified proxy object format support - Documented all new crawler parameters from Crawl4AI 0.7.3/0.7.4 ## Version 3.0.0 (2025-08-30) ### Features - Added full support for Crawl4AI 0.7.3/0.7.4 features: - **'undetected' browser type** - Stealth browser option for anti-bot detection - **New crawler parameters**: - `delay_before_return_html` - Delay before returning HTML content - `css_selector` - Filter content by CSS selector - `include_links` - Include extracted links in response - `resolve_absolute_urls` - Convert relative URLs to absolute - **Extraction strategies** - Support for LLM extraction, table extraction, and markdown generation options - **Multi-config batch crawling** - Per-URL configurations in batch_crawl - **Unified proxy format** - Support both string and object proxy configurations - **Memory metrics display** - Show server memory usage when available ### Improvements - Enhanced error formatting for better debugging - Better handling of object error responses from API - Fixed batch_crawl to include required `urls` field when using configs array ### Testing - Added comprehensive integration tests for all new features - Fixed TypeScript errors in test files - All 306 unit tests passing - All 150 integration tests passing ### Backward Compatibility - Fully backward compatible with older Crawl4AI servers (before 0.7.4) - All new features are optional and gracefully degrade ## Version 2.9.0 (2025-08-29) ### Breaking Changes - Consolidated session management into single `manage_session` tool - Replaces `create_session`, `clear_session`, and `list_sessions` tools - Uses discriminated union with `action` parameter: 'create', 'clear', or 'list' - Reduces tool count from 15 to 13 ### Removed - Removed `create_session` tool (use `manage_session` with `action: 'create'`) - Removed `clear_session` tool (use `manage_session` with `action: 'clear'`) - Removed `list_sessions` tool (use `manage_session` with `action: 'list'`) ### Improvements - Simplified API surface for better LLM interaction - Improved type safety with discriminated unions - Reduced code duplication in session management ### Testing - Updated all tests to use new `manage_session` tool - Maintained 100% test coverage ## Version 2.7.1 (2025-08-30) ### Bug Fixes - Fixed lint/formatting issues in test files - Cleaned up trailing whitespace ## Version 2.7.0 (2025-08-30) ### Compatibility Updates - Verified full compatibility with Crawl4AI version 0.7.4 - All 15 MCP tools tested and working - 100% integration test pass rate (148 tests) - Supports new v0.7.3/0.7.4 features including: - Undetected browser support with stealth mode - Multi-URL configuration system - Enhanced table extraction - Memory optimization improvements ### Bug Fixes - Fixed unit test timeout issues in NPX and CLI tests - Added proper process cleanup and timeouts - Fixed edge case where dotenv was loading during tests - Ensured all spawned child processes are properly terminated ### Testing - Comprehensive testing against Crawl4AI v0.7.4 Docker image - All integration tests pass with LLM features enabled - Unit test suite: 308 tests passing - Integration test suite: 148 tests passing ## Version 2.6.12 (2025-08-05) ### Bug Fixes - Fixed server startup issue when running via npx - Removed complex module detection logic that was preventing server startup - Server now always starts when the script is executed (as intended for MCP servers) - Simplified dotenv loading to only attempt in development when env vars aren't set ## Version 2.6.11 (2025-08-05) ### Bug Fixes - Fixed environment variable handling when running via npx - Only loads .env file if CRAWL4AI_BASE_URL is not already set - Prevents issues when env vars are passed via CLI/MCP configuration - Ensures package works correctly with Claude Desktop and other MCP clients ## Version 2.6.10 (2025-08-05) ### Bug Fixes - Fixed unit tests to use correct localhost URL from jest.setup.cjs - Fixed network error handling tests to not specify request body in nock mocks - Unit tests always use http://localhost:11235 as configured - Integration tests get URL from .env file ### Code Quality - Replaced all 'any' type warnings with proper type assertions in tests - All tests passing with zero lint warnings ## Version 2.6.9 (2025-08-05) ### Testing Improvements - Improved crawl4ai-service.ts test coverage from 76% to 84% - Added comprehensive network error handling tests - Added URL validation tests for all service methods - Added tests for optional parameter handling - Added JavaScript validation edge case tests ### Code Quality - All tests pass with zero lint errors - Maintained 100% function coverage for service layer ## Version 2.6.8 (2025-08-05) ### Code Cleanup - Removed unused mock generation system - Cleaned up package.json scripts - Simplified development workflow ### Chores - Verified alignment between unit tests, integration tests, and implementation - Confirmed all tests properly mock API interactions ## Version 2.6.7 (2025-08-05) ### Bug Fixes - Fixed integration tests to use production Crawl4AI server from environment variables - Fixed child process environment variable loading in test utilities - Added support for both string and object markdown responses from Crawl4AI API - Fixed timeout issues in MHTML capture and HTML extraction tests - Replaced unreliable test URLs (httpbin.org) with stable alternatives - Added 30-second timeout to session creation to prevent socket hang-ups ### Testing Improvements - Integration tests now run sequentially (maxWorkers: 1) to avoid rate limiting - Added proper working directory configuration for child processes - Fixed all integration tests to pass with production API - Maintained test coverage at 92.25% with all tests passing ## Version 2.6.6 (2025-08-05) ### Testing - Improved test coverage from 88.8% to 93.19% - Added comprehensive CLI entry point tests for signal handling, environment variables, and dotenv loading - Added network failure tests for axios timeout and HTTP error scenarios - Added input validation edge case tests for JavaScript code validation - Added parameter combination tests for optional parameters and edge cases - Improved branch coverage from 80.76% to 86.12% - Improved function coverage from 96.41% to 98.92% ## Version 2.6.5 (2025-08-05) ### Features - Enhanced screenshot handling for better compatibility - Added home directory (`~`) path resolution support - Large screenshots (>800KB) are now saved locally without being returned inline to avoid MCP's 1MB response limit - Clear indication when screenshots are too large to display inline ### Bug Fixes - Improved screenshot directory handling - Better parameter descriptions clarifying that only directory paths should be provided - Added automatic handling when file paths are mistakenly provided instead of directories - Warning messages when incorrect path format is detected - Ensures compatibility with various LLM usage patterns ## Version 2.6.4 (2025-08-04) ### Features - Added local screenshot storage support - capture_screenshot: New save_to_directory parameter saves screenshots locally while returning as MCP resource - crawl: New screenshot_directory parameter saves screenshots when screenshot=true - Automatic filename generation using URL hostname and timestamp - Creates directories if they don't exist - Graceful error handling - failures don't interrupt the crawl operation - Added comprehensive unit tests for file saving functionality ## Version 2.6.3 (2025-08-04) ### Enhancements - Improved tool descriptions for better LLM understanding and workflow clarity - Added [STATELESS], [SUPPORTS SESSIONS], [SESSION MANAGEMENT] indicators - Enhanced get_html description to emphasize selector discovery for automation - Added inspect-first workflow patterns to crawl tool description - Emphasized element verification in js_code parameter description - Added typical workflow guidance to create_session - Improved cross-references between related tools - Removed problematic one-shot form pattern that assumed element existence ### Bug Fixes - Fixed crawl_recursive max_depth behavior - max_depth: 0 now correctly crawls only the initial page - Previously, max_depth: 0 would crawl pages at depth 0 and depth 1 ## Version 2.6.2 (2025-08-04) ### Refactoring - Consolidated error handling in server.ts with validateAndExecute helper - Reduced ~90 lines of duplicate code - Preserved exact error message format for LLM compatibility - Improved maintainability while keeping behavior identical - Server.ts coverage improved from ~90% to 98.66% ## Version 2.6.1 (2025-08-04) ### Testing - Improved crawl-handlers test coverage from 87% to 97% - Added comprehensive unit tests for all crawl handler methods - Test error handling for batchCrawl, smartCrawl, crawlRecursive, parseSitemap - Cover edge cases including XML detection, URL validation, depth limits - Added integration tests for real API behavior validation - Test all crawl parameters including word_count_threshold, image thresholds, exclude_social_media_links - Properly handle MCP error formatting vs direct handler throws ## Version 2.6.0 (2025-08-04) ### Testing - Added comprehensive test coverage for error handling paths - Session creation with failed initial crawl - JavaScript execution error handling with accurate API response formats - Extract links manual extraction fallback when API returns empty links - Improved coverage from 87.23% to 89.71% lines - Added integration tests for crawl error handling - Invalid URL validation - Non-existent domain handling - Added unit tests for utility handlers - Manual link extraction from markdown - Malformed URL handling - Empty results scenarios ### Improvements - Better error resilience in session creation when initial crawl fails - More accurate test mocks based on real API responses ## Version 2.5.0 (2025-08-04) ### Refactoring - Removed backward compatibility exports from index.ts - Updated test imports to use direct module paths - Cleaned up index.ts to focus solely on CLI entry point ### Testing - Updated jest.setup.cjs to load .env for integration tests - Unit tests continue using localhost:11235 - Integration tests now use values from .env file ## Version 2.4.0 (2025-08-04) ### Features - Replaced Codecov with GitHub Actions-based coverage badge - Coverage badge now uses GitHub Gist for storage - No external dependencies for coverage tracking - Badge updates automatically with each CI run - Coverage reports published to GitHub Pages - Interactive HTML coverage report available at https://omgwtfwow.github.io/mcp-crawl4ai-ts/coverage/ ### Bug Fixes - Fixed smart_crawl implementation to remove unsupported 'strategy' parameter - Fixed coverage extraction in CI to use lcov.info format - Added proper URL encoding for Shields.io endpoint badge ### CI/CD Improvements - Added GitHub Pages deployment for coverage reports - Added write permissions for GitHub Actions to create gh-pages branch - Removed Codecov integration completely ### Maintenance - Removed .codecov.yml configuration file - Removed CODECOV_TOKEN from repository secrets - Updated README.md with new coverage badge ## Version 2.3.0 (2025-08-03) ### Refactoring - Split large 2,366-line index.ts file into modular structure - Created handlers/ directory with operation-specific handlers - Created schemas/ directory for validation schemas - Reduced file sizes to under 1,000 lines each (most under 300) - Maintained backward compatibility with all exports - Improved code organization and maintainability ### Testing - Updated tests to work with new modular structure - Maintained test coverage at 87.23% (exceeds 86% requirement) - All 165 unit tests passing ## Version 2.2.0 (2025-08-03) ### Features - Added comprehensive test coverage infrastructure - Set up Jest code coverage with Istanbul - Added test:coverage and test:ci npm scripts - Configured coverage thresholds (80% for all metrics) - Added coverage badge to README - Achieved 86.51% line coverage, 82.21% statement coverage ### Testing Improvements - Added comprehensive unit tests for all tool handlers in index.ts - Tests for success cases, error handling, and edge cases - Tests for MCP protocol request handling - Tests for parameter validation with Zod schemas - Added unit tests for JavaScript validation function - Added tests for private methods: parseSitemap and detectContentType - Fixed integration test reliability issues: - Replaced example.com with httpbin.org in execute-js tests - Fixed test expectations for JavaScript execution results - Fixed MCP request handler test setup ### Bug Fixes - Fixed parse_sitemap implementation to use axios.get directly instead of non-existent service method - Fixed TypeScript 'any' warnings in test files (eliminated 90+ warnings) - Fixed linting errors and formatting issues across the test suite - Fixed test URL in batch-crawl test (httpbingo.org → httpbin.org) ### CI/CD Improvements - Updated GitHub Actions workflow to include coverage reporting - Added Node.js 22.x to the test matrix - Fixed all failing CI tests ## Version 2.1.2 (2025-08-03) ### Documentation - Updated Node.js requirement from 16+ to 18+ to reflect actual testing and support - Node.js 16 reached End-of-Life in September 2023 - CI only tests on Node.js 18.x and 20.x - Added `engines` field to package.json to enforce Node.js 18+ requirement ## Version 2.1.1 (2025-08-03) ### Bug Fixes - Fixed GitHub homepage README display issue by renaming .github/README.md to CI.md - GitHub was showing the CI documentation instead of the main project README ## Version 2.1.0 (2025-08-03) ### Bug Fixes - Fixed `smart_crawl` bug where markdown object was incorrectly printed as `[object Object]` - Now correctly accesses `result.markdown.raw_markdown` for content display - Fixed integration test timeout issues: - Replaced example.com with httpbin.org/html in tests to avoid "domcontentloaded" timeout issues - Fixed httpbin.org URLs by adding proper path suffixes (e.g., /links/5/0) - Limited Jest parallelization for integration tests to prevent server overload - Fixed parameter mapping in `get_markdown` tool - now correctly maps schema properties (`filter`, `query`, `cache`) to API parameters (`f`, `q`, `c`) - Fixed `smart_crawl` schema to use `follow_links` parameter instead of `remove_images` - Fixed `extract_links` schema mismatch - corrected schema to use `categorize` parameter as defined in tool - Fixed `extract_links` implementation to properly handle link objects returned by API - Fixed `crawl_recursive` schema mismatch - corrected schema to use `include_pattern` and `exclude_pattern` instead of `filter_pattern` and `bypass_cache` - Fixed `crawl_recursive` implementation to use `/crawl` endpoint instead of `/md` for proper link extraction - Fixed `crawl_recursive` type issues and improved link handling for recursive crawling - Fixed `parse_sitemap` implementation to fetch sitemaps directly instead of through Crawl4AI server API - Fixed `create_session` schema to make `session_id` optional as documented - Enhanced `create_session` response to include all session parameters for programmatic access - Implemented proper handling for non-functional server parameters: - `batch_crawl`: `remove_images` now uses `exclude_tags` in crawler_config to actually remove images - `smart_crawl`: `follow_links` now crawls URLs found in sitemaps/RSS feeds (max 10 URLs) - Fixed `crawl` and `generate_pdf` tools PDF response to use proper MCP SDK embedded resource format with blob field ### Improvements - Added comprehensive integration tests for `batch_crawl` tool (7 tests) - Added comprehensive integration tests for `smart_crawl` tool (8 tests) - Fixed all ESLint formatting issues across the codebase - Enhanced error handling for empty URL arrays in batch_crawl - Improved test reliability by replacing problematic test URLs - Updated tool descriptions to accurately reflect actual behavior - Added proper TypeScript types for getMarkdown function - Enhanced test coverage for batch_crawl parameter handling - Added comprehensive unit and integration tests for `extract_links` tool - Improved JSON endpoint detection in `extract_links` tool - Better error handling for `extract_links` with graceful error messages - Added comprehensive integration tests for `crawl_recursive` tool - Improved `crawl_recursive` output format to clearly show depth levels and internal link counts - Enhanced error handling in `crawl_recursive` to continue crawling even if individual pages fail - Added comprehensive integration tests for `parse_sitemap` tool with various test cases - Added comprehensive integration tests for session management tools (`create_session`, `clear_session`, `list_sessions`) - Enhanced integration tests for `extract_with_llm` tool to handle non-deterministic LLM responses - Installed nock library for future HTTP mocking in unit tests - Fixed TypeScript lint warnings by replacing `any` types with proper types: - Changed error handling to use proper type assertions - Updated `unknown[]` for JavaScript execution results - Used `Record<string, unknown>` for generic objects - Created `LinkItem` interface for better type safety - Fixed all production code `any` types - Removed unused legacy `CrawlResult` interface - Consolidated unit tests to use nock for HTTP mocking: - Removed redundant Jest mock test file - Removed unused mocks directory - Renamed test file for clarity - Improved unit test performance from 92s to ~1s by removing timeout tests - Cleaned up test organization and removed test README - Added GitHub Actions CI workflow: - Automatic testing on push to main and pull requests - Tests run on Node.js 18.x and 20.x - Includes linting, formatting checks, and build verification - Added mock helper scripts: - `npm run generate-mocks`: Generate nock mock code from real API - `npm run view-mocks`: View and save API responses for reference - Both scripts help maintain accurate test mocks ## Version 2.0.1 (2025-08-02) Update README ## Version 2.0.0 (2025-08-02) ### Breaking Changes - Renamed `crawl_with_config` tool to `crawl` ### New Features - Added comprehensive response types for all endpoints (PDF, screenshot, HTML, markdown) - Enhanced parameter validation with clearer error messages - Improved documentation for JavaScript execution patterns - Added selector strategy guidance for form interaction - Better distinction between `wait_for` and `wait_until` usage ### Bug Fixes - Fixed server 500 errors by always including `crawler_config` in requests - Updated media and links types to match actual server responses - Corrected validation for `js_only` parameter usage ### Documentation - Added troubleshooting section with common issues and solutions - Included practical examples for form filling and multi-step navigation - Enhanced tool descriptions with clear warnings and recommendations - Added selector strategy guide for working with dynamic content ### Technical Improvements - Updated all TypeScript types based on actual server responses - Improved error handling and user-friendly messages - Enhanced Zod validation schemas with helpful refinements - Added comprehensive integration tests for new features ### Known Issues - `js_only: true` causes server serialization errors - use `screenshot: true` as workaround - Using `wait_for` with elements that already exist can cause timeouts - use `wait_until` instead ## Version 1.0.2 - Initial stable release with full MCP implementation - Support for all Crawl4AI endpoints - Basic session management - Integration with MCP clients

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/omgwtfwow/mcp-crawl4ai-ts'

If you have feedback or need assistance with the MCP directory API, please join our Discord server