Enables JavaScript execution during web scraping for dynamic content rendering and interaction with client-side applications
Converts scraped web content and processed documents into markdown format for easy consumption and analysis
Supports Pydantic model definitions for schema-driven structured data extraction from web content
Built on Python 3.12+ runtime environment for web scraping and document processing operations
Uses SQLite database backend for intelligent caching of scraped content and metadata storage
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@FreeCrawl MCP Serverscrape https://news.ycombinator.com with markdown and screenshot"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
FreeCrawl MCP Server
A production-ready Model Context Protocol (MCP) server for web scraping and document processing, designed as a self-hosted replacement for Firecrawl.
π Features
JavaScript-enabled web scraping with Playwright and anti-detection measures
Document processing with fallback support for various formats
Concurrent batch processing with configurable limits
Intelligent caching with SQLite backend
Rate limiting per domain
Comprehensive error handling with retry logic
Easy installation via
uvxor local development setupHealth monitoring and metrics collection
Related MCP server: Fetch MCP Server
MCP Config (using uvx)
{
"mcpServers": {
"freecrawl": {
"command": "uvx",
"args": ["freecrawl-mcp"],
}
}
}π¦ Installation & Usage
Quick Start with uvx (Recommended)
The easiest way to use FreeCrawl is with uvx, which automatically manages dependencies:
# Install browsers on first run
uvx freecrawl-mcp --install-browsers
# Test functionality
uvx freecrawl-mcp --testLocal Development Setup
For local development or customization:
Clone from GitHub:
git clone https://github.com/dylan-gluck/freecrawl-mcp.git cd freecrawl-mcpSet up environment:
# Sync dependencies uv sync # Install browser dependencies uv run freecrawl-mcp --install-browsers # Run tests uv run freecrawl-mcp --testRun the server:
uv run freecrawl-mcp
π Configuration
Configure FreeCrawl using environment variables:
Basic Configuration
# Transport (stdio for MCP, http for REST API)
export FREECRAWL_TRANSPORT=stdio
# Browser pool settings
export FREECRAWL_MAX_BROWSERS=3
export FREECRAWL_HEADLESS=true
# Concurrency limits
export FREECRAWL_MAX_CONCURRENT=10
export FREECRAWL_MAX_PER_DOMAIN=3
# Cache settings
export FREECRAWL_CACHE=true
export FREECRAWL_CACHE_DIR=/tmp/freecrawl_cache
export FREECRAWL_CACHE_TTL=3600
export FREECRAWL_CACHE_SIZE=536870912 # 512MB
# Rate limiting
export FREECRAWL_RATE_LIMIT=60 # requests per minute
# Logging
export FREECRAWL_LOG_LEVEL=INFOSecurity Settings
# API authentication (optional)
export FREECRAWL_REQUIRE_API_KEY=false
export FREECRAWL_API_KEYS=key1,key2,key3
# Domain blocking
export FREECRAWL_BLOCKED_DOMAINS=localhost,127.0.0.1
# Anti-detection
export FREECRAWL_ANTI_DETECT=true
export FREECRAWL_ROTATE_UA=trueπ§ MCP Tools
FreeCrawl provides the following MCP tools:
freecrawl_scrape
Scrape content from a single URL with advanced options.
Parameters:
url(string): URL to scrapeformats(array): Output formats -["markdown", "html", "text", "screenshot", "structured"]javascript(boolean): Enable JavaScript execution (default: true)wait_for(string, optional): CSS selector or time (ms) to waitanti_bot(boolean): Enable anti-detection measures (default: true)headers(object, optional): Custom HTTP headerscookies(object, optional): Custom cookiescache(boolean): Use cached results if available (default: true)timeout(number): Total timeout in milliseconds (default: 30000)
Example:
{
"name": "freecrawl_scrape",
"arguments": {
"url": "https://example.com",
"formats": ["markdown", "screenshot"],
"javascript": true,
"wait_for": "2000"
}
}freecrawl_batch_scrape
Scrape multiple URLs concurrently.
Parameters:
urls(array): List of URLs to scrape (max 100)concurrency(number): Maximum concurrent requests (default: 5)formats(array): Output formats (default:["markdown"])common_options(object, optional): Options applied to all URLscontinue_on_error(boolean): Continue if individual URLs fail (default: true)
Example:
{
"name": "freecrawl_batch_scrape",
"arguments": {
"urls": [
"https://example.com/page1",
"https://example.com/page2"
],
"concurrency": 3,
"formats": ["markdown", "text"]
}
}freecrawl_extract
Extract structured data using schema-driven approach.
Parameters:
url(string): URL to extract data fromschema(object): JSON Schema or Pydantic model definitionprompt(string, optional): Custom extraction instructionsvalidation(boolean): Validate against schema (default: true)multiple(boolean): Extract multiple matching items (default: false)
Example:
{
"name": "freecrawl_extract",
"arguments": {
"url": "https://example.com/product",
"schema": {
"type": "object",
"properties": {
"title": {"type": "string"},
"price": {"type": "number"}
}
}
}
}freecrawl_process_document
Process documents (PDF, DOCX, etc.) with OCR support.
Parameters:
file_path(string, optional): Path to document fileurl(string, optional): URL to download document fromstrategy(string): Processing strategy -"fast","hi_res","ocr_only"(default: "hi_res")formats(array): Output formats -["markdown", "structured", "text"]languages(array, optional): OCR languages (e.g.,["eng", "fra"])extract_images(boolean): Extract embedded images (default: false)extract_tables(boolean): Extract and structure tables (default: true)
Example:
{
"name": "freecrawl_process_document",
"arguments": {
"url": "https://example.com/document.pdf",
"strategy": "hi_res",
"formats": ["markdown", "structured"]
}
}freecrawl_health_check
Get server health status and metrics.
Example:
{
"name": "freecrawl_health_check",
"arguments": {}
}π Integration with Claude Code
MCP Configuration
Add FreeCrawl to your MCP configuration:
Using uvx (Recommended):
{
"mcpServers": {
"freecrawl": {
"command": "uvx",
"args": ["freecrawl-mcp"]
}
}
}Using local development setup:
{
"mcpServers": {
"freecrawl": {
"command": "uv",
"args": ["run", "freecrawl-mcp"],
"cwd": "/path/to/freecrawl-mcp"
}
}
}Usage in Prompts
Please scrape the content from https://example.com and extract the main article text in markdown format.Claude Code will automatically use the freecrawl_scrape tool to fetch and process the content.
π Performance & Scalability
Resource Usage
Memory: ~100MB base + ~50MB per browser instance
CPU: Moderate usage during active scraping
Storage: Cache grows based on configured limits
Throughput
Single requests: 2-5 seconds typical response time
Batch processing: 10-50 concurrent requests depending on configuration
Cache hit ratio: 30%+ for repeated content
Optimization Tips
Enable caching for frequently accessed content
Adjust concurrency based on target site rate limits
Use appropriate formats - markdown is faster than screenshots
Configure rate limiting to avoid being blocked
π‘ Security Considerations
Anti-Detection
Rotating user agents
Realistic browser fingerprints
Request timing randomization
JavaScript execution in sandboxed environment
Input Validation
URL format validation
Private IP blocking
Domain blocklist support
Request size limits
Resource Protection
Memory usage monitoring
Browser pool size limits
Request timeout enforcement
Rate limiting per domain
π§ Troubleshooting
Common Issues
Issue | Possible Cause | Solution |
High memory usage | Too many browser instances | Reduce |
Slow responses | JavaScript-heavy sites | Increase timeout or disable JS |
Bot detection | Missing anti-detection | Ensure |
Cache misses | TTL too short | Increase |
Import errors | Missing dependencies | Run |
Debug Mode
With uvx:
export FREECRAWL_LOG_LEVEL=DEBUG
uvx freecrawl-mcp --testLocal development:
export FREECRAWL_LOG_LEVEL=DEBUG
uv run freecrawl-mcp --testπ Monitoring & Observability
Health Metrics
Browser pool status
Memory and CPU usage
Cache hit rates
Request success rates
Response times
Logging
FreeCrawl provides structured logging with configurable levels:
ERROR: Critical failures
WARNING: Recoverable issues
INFO: General operations
DEBUG: Detailed troubleshooting
π§ Development
Running Tests
With uvx:
# Basic functionality test
uvx freecrawl-mcp --testLocal development:
# Basic functionality test
uv run freecrawl-mcp --testCode Structure
Core server:
FreeCrawlServerclassBrowser management:
BrowserPoolfor resource poolingContent extraction:
ContentExtractorwith multiple strategiesCaching:
CacheManagerwith SQLite backendRate limiting:
RateLimiterwith token bucket algorithm
π License
This project is licensed under the MIT License - see the technical specification for details.
π€ Contributing
Fork the repository at https://github.com/dylan-gluck/freecrawl-mcp
Create a feature branch
Set up local development:
uv syncRun tests:
uv run freecrawl-mcp --testSubmit a pull request
π Technical Specification
For detailed technical information, see ai_docs/FREECRAWL_TECHNICAL_SPEC.md.
FreeCrawl MCP Server - Self-hosted web scraping for the modern web π