Skip to main content
Glama

FreeCrawl MCP Server

A production-ready Model Context Protocol (MCP) server for web scraping and document processing, designed as a self-hosted replacement for Firecrawl.

πŸš€ Features

  • JavaScript-enabled web scraping with Playwright and anti-detection measures

  • Document processing with fallback support for various formats

  • Concurrent batch processing with configurable limits

  • Intelligent caching with SQLite backend

  • Rate limiting per domain

  • Comprehensive error handling with retry logic

  • Easy installation via uvx or local development setup

  • Health monitoring and metrics collection

Related MCP server: Fetch MCP Server

MCP Config (using uvx)

{
  "mcpServers": {
    "freecrawl": {
      "command": "uvx",
      "args": ["freecrawl-mcp"],
    }
  }
}

πŸ“¦ Installation & Usage

The easiest way to use FreeCrawl is with uvx, which automatically manages dependencies:

# Install browsers on first run
uvx freecrawl-mcp --install-browsers

# Test functionality
uvx freecrawl-mcp --test

Local Development Setup

For local development or customization:

  1. Clone from GitHub:

    git clone https://github.com/dylan-gluck/freecrawl-mcp.git
    cd freecrawl-mcp
  2. Set up environment:

    # Sync dependencies
    uv sync
    
    # Install browser dependencies
    uv run freecrawl-mcp --install-browsers
    
    # Run tests
    uv run freecrawl-mcp --test
  3. Run the server:

    uv run freecrawl-mcp

πŸ›  Configuration

Configure FreeCrawl using environment variables:

Basic Configuration

# Transport (stdio for MCP, http for REST API)
export FREECRAWL_TRANSPORT=stdio

# Browser pool settings
export FREECRAWL_MAX_BROWSERS=3
export FREECRAWL_HEADLESS=true

# Concurrency limits
export FREECRAWL_MAX_CONCURRENT=10
export FREECRAWL_MAX_PER_DOMAIN=3

# Cache settings
export FREECRAWL_CACHE=true
export FREECRAWL_CACHE_DIR=/tmp/freecrawl_cache
export FREECRAWL_CACHE_TTL=3600
export FREECRAWL_CACHE_SIZE=536870912  # 512MB

# Rate limiting
export FREECRAWL_RATE_LIMIT=60  # requests per minute

# Logging
export FREECRAWL_LOG_LEVEL=INFO

Security Settings

# API authentication (optional)
export FREECRAWL_REQUIRE_API_KEY=false
export FREECRAWL_API_KEYS=key1,key2,key3

# Domain blocking
export FREECRAWL_BLOCKED_DOMAINS=localhost,127.0.0.1

# Anti-detection
export FREECRAWL_ANTI_DETECT=true
export FREECRAWL_ROTATE_UA=true

πŸ”§ MCP Tools

FreeCrawl provides the following MCP tools:

freecrawl_scrape

Scrape content from a single URL with advanced options.

Parameters:

  • url (string): URL to scrape

  • formats (array): Output formats - ["markdown", "html", "text", "screenshot", "structured"]

  • javascript (boolean): Enable JavaScript execution (default: true)

  • wait_for (string, optional): CSS selector or time (ms) to wait

  • anti_bot (boolean): Enable anti-detection measures (default: true)

  • headers (object, optional): Custom HTTP headers

  • cookies (object, optional): Custom cookies

  • cache (boolean): Use cached results if available (default: true)

  • timeout (number): Total timeout in milliseconds (default: 30000)

Example:

{
  "name": "freecrawl_scrape",
  "arguments": {
    "url": "https://example.com",
    "formats": ["markdown", "screenshot"],
    "javascript": true,
    "wait_for": "2000"
  }
}

freecrawl_batch_scrape

Scrape multiple URLs concurrently.

Parameters:

  • urls (array): List of URLs to scrape (max 100)

  • concurrency (number): Maximum concurrent requests (default: 5)

  • formats (array): Output formats (default: ["markdown"])

  • common_options (object, optional): Options applied to all URLs

  • continue_on_error (boolean): Continue if individual URLs fail (default: true)

Example:

{
  "name": "freecrawl_batch_scrape",
  "arguments": {
    "urls": [
      "https://example.com/page1",
      "https://example.com/page2"
    ],
    "concurrency": 3,
    "formats": ["markdown", "text"]
  }
}

freecrawl_extract

Extract structured data using schema-driven approach.

Parameters:

  • url (string): URL to extract data from

  • schema (object): JSON Schema or Pydantic model definition

  • prompt (string, optional): Custom extraction instructions

  • validation (boolean): Validate against schema (default: true)

  • multiple (boolean): Extract multiple matching items (default: false)

Example:

{
  "name": "freecrawl_extract",
  "arguments": {
    "url": "https://example.com/product",
    "schema": {
      "type": "object",
      "properties": {
        "title": {"type": "string"},
        "price": {"type": "number"}
      }
    }
  }
}

freecrawl_process_document

Process documents (PDF, DOCX, etc.) with OCR support.

Parameters:

  • file_path (string, optional): Path to document file

  • url (string, optional): URL to download document from

  • strategy (string): Processing strategy - "fast", "hi_res", "ocr_only" (default: "hi_res")

  • formats (array): Output formats - ["markdown", "structured", "text"]

  • languages (array, optional): OCR languages (e.g., ["eng", "fra"])

  • extract_images (boolean): Extract embedded images (default: false)

  • extract_tables (boolean): Extract and structure tables (default: true)

Example:

{
  "name": "freecrawl_process_document",
  "arguments": {
    "url": "https://example.com/document.pdf",
    "strategy": "hi_res",
    "formats": ["markdown", "structured"]
  }
}

freecrawl_health_check

Get server health status and metrics.

Example:

{
  "name": "freecrawl_health_check",
  "arguments": {}
}

πŸ”„ Integration with Claude Code

MCP Configuration

Add FreeCrawl to your MCP configuration:

Using uvx (Recommended):

{
  "mcpServers": {
    "freecrawl": {
      "command": "uvx",
      "args": ["freecrawl-mcp"]
    }
  }
}

Using local development setup:

{
  "mcpServers": {
    "freecrawl": {
      "command": "uv",
      "args": ["run", "freecrawl-mcp"],
      "cwd": "/path/to/freecrawl-mcp"
    }
  }
}

Usage in Prompts

Please scrape the content from https://example.com and extract the main article text in markdown format.

Claude Code will automatically use the freecrawl_scrape tool to fetch and process the content.

πŸš€ Performance & Scalability

Resource Usage

  • Memory: ~100MB base + ~50MB per browser instance

  • CPU: Moderate usage during active scraping

  • Storage: Cache grows based on configured limits

Throughput

  • Single requests: 2-5 seconds typical response time

  • Batch processing: 10-50 concurrent requests depending on configuration

  • Cache hit ratio: 30%+ for repeated content

Optimization Tips

  1. Enable caching for frequently accessed content

  2. Adjust concurrency based on target site rate limits

  3. Use appropriate formats - markdown is faster than screenshots

  4. Configure rate limiting to avoid being blocked

πŸ›‘ Security Considerations

Anti-Detection

  • Rotating user agents

  • Realistic browser fingerprints

  • Request timing randomization

  • JavaScript execution in sandboxed environment

Input Validation

  • URL format validation

  • Private IP blocking

  • Domain blocklist support

  • Request size limits

Resource Protection

  • Memory usage monitoring

  • Browser pool size limits

  • Request timeout enforcement

  • Rate limiting per domain

πŸ”§ Troubleshooting

Common Issues

Issue

Possible Cause

Solution

High memory usage

Too many browser instances

Reduce FREECRAWL_MAX_BROWSERS

Slow responses

JavaScript-heavy sites

Increase timeout or disable JS

Bot detection

Missing anti-detection

Ensure FREECRAWL_ANTI_DETECT=true

Cache misses

TTL too short

Increase FREECRAWL_CACHE_TTL

Import errors

Missing dependencies

Run uvx freecrawl-mcp --test

Debug Mode

With uvx:

export FREECRAWL_LOG_LEVEL=DEBUG
uvx freecrawl-mcp --test

Local development:

export FREECRAWL_LOG_LEVEL=DEBUG
uv run freecrawl-mcp --test

πŸ“ˆ Monitoring & Observability

Health Metrics

  • Browser pool status

  • Memory and CPU usage

  • Cache hit rates

  • Request success rates

  • Response times

Logging

FreeCrawl provides structured logging with configurable levels:

  • ERROR: Critical failures

  • WARNING: Recoverable issues

  • INFO: General operations

  • DEBUG: Detailed troubleshooting

πŸ”§ Development

Running Tests

With uvx:

# Basic functionality test
uvx freecrawl-mcp --test

Local development:

# Basic functionality test
uv run freecrawl-mcp --test

Code Structure

  • Core server: FreeCrawlServer class

  • Browser management: BrowserPool for resource pooling

  • Content extraction: ContentExtractor with multiple strategies

  • Caching: CacheManager with SQLite backend

  • Rate limiting: RateLimiter with token bucket algorithm

πŸ“„ License

This project is licensed under the MIT License - see the technical specification for details.

🀝 Contributing

  1. Fork the repository at https://github.com/dylan-gluck/freecrawl-mcp

  2. Create a feature branch

  3. Set up local development: uv sync

  4. Run tests: uv run freecrawl-mcp --test

  5. Submit a pull request

πŸ“š Technical Specification

For detailed technical information, see ai_docs/FREECRAWL_TECHNICAL_SPEC.md.


FreeCrawl MCP Server - Self-hosted web scraping for the modern web πŸš€

-
security - not tested
F
license - not found
-
quality - not tested

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/dylan-gluck/freecrawl-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server