Which integrations are available for this server?

Converts extracted web content into clean, LLM-optimized Markdown for efficient processing and consumption by AI models. Provides tools to search the web using SearXNG, featuring automatic browser management and support for filtering by categories like news, science, and images.

How do I use crawl-mcp-server?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@crawl-mcp-server extract the content from https://en.wikipedia.org/wiki/Large_language_model" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

crawl-mcp-server

A comprehensive MCP (Model Context Protocol) server providing 11 powerful tools for web crawling and search. Transform web content into clean, LLM-optimized Markdown or search the web with SearXNG integration.

CI Tests codecov

✨ Features

🔍 SearXNG Web Search - Search the web with automatic browser management
📄 4 Crawling Tools - Extract and convert web content to Markdown
🚀 Auto-Browser-Launch - Search tools automatically manage browser lifecycle
📦 11 Total Tools - Complete toolkit for web interaction
💾 Built-in Caching - SHA-256 based caching with graceful fallbacks
⚡ Concurrent Processing - Handle multiple URLs simultaneously (up to 50)
🎯 LLM-Optimized Output - Clean Markdown perfect for AI consumption
🛡️ Robust Error Handling - Graceful failure with detailed error messages
🧪 Comprehensive Testing - Full CI/CD with performance benchmarks

📦 Installation

Method 1: npm (Recommended)

npm install crawl-mcp-server

Method 2: Direct from Git

# Install latest from GitHub npm install git+https://github.com/Git-Fg/searchcrawl-mcp-server.git # Or specific branch npm install git+https://github.com/Git-Fg/searchcrawl-mcp-server.git#main # Or from a fork npm install git+https://github.com/YOUR_FORK/searchcrawl-mcp-server.git

Method 3: Clone and Build

git clone https://github.com/Git-Fg/searchcrawl-mcp-server.git cd crawl-mcp-server npm install npm run build

Method 4: npx (No Installation)

# Run directly without installing npx git+https://github.com/Git-Fg/searchcrawl-mcp-server.git

🔧 Setup for Claude Code

Option 1: MCP Desktop (Recommended)

Add to your Claude Desktop configuration file:

** macOS/Linux: ~/.config/claude/claude_desktop_config.json**

{ "mcpServers": { "crawl-server": { "command": "npx", "args": [ "git+https://github.com/Git-Fg/searchcrawl-mcp-server.git" ], "env": { "NODE_ENV": "production" } } } }

** Windows: %APPDATA%\Claude\claude_desktop_config.json**

{ "mcpServers": { "crawl-server": { "command": "npx", "args": [ "git+https://github.com/Git-Fg/searchcrawl-mcp-server.git" ], "env": { "NODE_ENV": "production" } } } }

Option 2: Local Installation

If you've installed locally:

{ "mcpServers": { "crawl-server": { "command": "node", "args": [ "/path/to/crawl-mcp-server/dist/index.js" ], "env": {} } } }

Option 3: Custom Path

For a specific installation:

{ "mcpServers": { "crawl-server": { "command": "node", "args": [ "/usr/local/lib/node_modules/crawl-mcp-server/dist/index.js" ], "env": {} } } }

After configuration, restart Claude Desktop.

🔧 Setup for Other MCP Clients

Claude CLI

# Using npx claude mcp add crawl-server npx git+https://github.com/Git-Fg/searchcrawl-mcp-server.git # Using local installation claude mcp add crawl-server node /path/to/crawl-mcp-server/dist/index.js

Zed Editor

Add to ~/.config/zed/settings.json:

{ "assistant": { "mcp": { "servers": { "crawl-server": { "command": "node", "args": ["/path/to/crawl-mcp-server/dist/index.js"] } } } } }

VSCode with Copilot Chat

{ "mcpServers": { "crawl-server": { "command": "node", "args": ["/path/to/crawl-mcp-server/dist/index.js"] } } }

🚀 Quick Start

Using MCP Inspector (Testing)

# Install MCP Inspector globally npm install -g @modelcontextprotocol/inspector # Run the server node dist/index.js # In another terminal, test tools npx @modelcontextprotocol/inspector --cli node dist/index.js --method tools/list

Development Mode

# Watch mode (auto-rebuild on changes) npm run dev # Build TypeScript npm run build # Run tests npm run test:run

📚 Available Tools

Search Tools (7 tools)

1. search_searx

Search the web using SearXNG with automatic browser management.

// Example call { "query": "TypeScript MCP server", "maxResults": 10, "category": "general", "timeRange": "week", "language": "en" }

Parameters:

query (string, required): Search query
maxResults (number, default: 20): Results to return (1-50)
category (enum, default: general): one of general, images, videos, news, map, music, it, science
timeRange (enum, optional): one of day, week, month, year
language (string, default: en): Language code

Returns: JSON with search results array, URLs, and metadata

2. launch_chrome_cdp

Launch system Chrome with remote debugging for advanced SearX usage.

{ "headless": true, "port": 9222, "userDataDir": "/path/to/profile" }

Parameters:

headless (boolean, default: true): Run Chrome headless
port (number, default: 9222): Remote debugging port
userDataDir (string, optional): Custom Chrome profile

3. connect_cdp

Connect to remote CDP browser (Browserbase, etc.).

{ "cdpWsUrl": "http://localhost:9222" }

Parameters:

cdpWsUrl (string, required): CDP WebSocket URL or HTTP endpoint

4. launch_local

Launch bundled Chromium for SearX search.

{ "headless": true, "userAgent": "custom user agent string" }

Parameters:

headless (boolean, default: true): Run headless
userAgent (string, optional): Custom user agent

5. chrome_status

Check Chrome CDP status and health.

{}

Returns: Running status, health, endpoint URL, and PID

6. close

Close browser session (keeps Chrome CDP running).

{}

7. shutdown_chrome_cdp

Shutdown Chrome CDP and cleanup resources.

{}

Crawling Tools (4 tools)

1. crawl_read ⭐ (Simple & Fast)

Quick single-page extraction to Markdown.

{ "url": "https://example.com/article", "options": { "timeout": 30000 } }

Best for:

✅ News articles
✅ Blog posts
✅ Documentation pages
✅ Simple content extraction

Returns: Clean Markdown content

2. crawl_read_batch ⭐ (Multiple URLs)

Process 1-50 URLs concurrently.

{ "urls": [ "https://example.com/article1", "https://example.com/article2", "https://example.com/article3" ], "options": { "maxConcurrency": 5, "timeout": 30000, "maxResults": 10 } }

Best for:

✅ Processing multiple articles
✅ Building content aggregates
✅ Bulk content extraction

Returns: Array of Markdown results with summary statistics

3. crawl_fetch_markdown

Controlled single-page extraction with full option control.

{ "url": "https://example.com/article", "options": { "timeout": 30000 } }

Best for:

✅ Advanced crawling options
✅ Custom timeout control
✅ Detailed extraction

4. crawl_fetch

Multi-page crawling with intelligent link extraction.

{ "url": "https://example.com", "options": { "pages": 5, "maxConcurrency": 3, "sameOriginOnly": true, "timeout": 30000, "maxResults": 20 } }

Best for:

✅ Crawling entire sites
✅ Link-based discovery
✅ Multi-page scraping

Features:

Extracts links from starting page
Crawls discovered pages
Concurrent processing
Same-origin filtering (configurable)

💡 Usage Examples

Example 1: Search + Crawl Workflow

// Step 1: Search for topics { "tool": "search_searx", "arguments": { "query": "TypeScript best practices 2024", "maxResults": 5 } } // Step 2: Extract URLs from results // (Parse the search results to get URLs) // Step 3: Crawl selected articles { "tool": "crawl_read_batch", "arguments": { "urls": [ "https://example.com/article1", "https://example.com/article2", "https://example.com/article3" ] } }

Example 2: Batch Content Extraction

{ "tool": "crawl_read_batch", "arguments": { "urls": [ "https://news.site/article1", "https://news.site/article2", "https://news.site/article3" ], "options": { "maxConcurrency": 10, "timeout": 30000, "maxResults": 3 } } }

Example 3: Site Crawling

{ "tool": "crawl_fetch", "arguments": { "url": "https://docs.example.com", "options": { "pages": 10, "maxConcurrency": 5, "sameOriginOnly": true, "timeout": 30000, "maxResults": 10 } } }

🎯 Tool Selection Guide

Use Case	Recommended Tool	Complexity
Single article	`crawl_read`	Simple
Multiple articles	`crawl_read_batch`	Simple
Advanced options	`crawl_fetch_markdown`	Medium
Site crawling	`crawl_fetch`	Complex
Web search	`search_searx`	Simple
Research workflow	`search_searx` → `crawl_read`	Medium

🏗️ Architecture

Core Components

┌─────────────────────────────────────────┐ │ crawl-mcp-server │ ├─────────────────────────────────────────┤ │ │ │ ┌──────────────────────────────┐ │ │ │ MCP Server Core │ │ │ │ - 11 registered tools │ │ │ │ - STDIO/HTTP transport │ │ │ └──────────────────────────────┘ │ │ │ │ │ ┌──────────────────────────────┐ │ │ │ @just-every/crawl │ │ │ │ - HTML → Markdown │ │ │ │ - Mozilla Readability │ │ │ │ - Concurrent crawling │ │ │ └──────────────────────────────┘ │ │ │ │ │ ┌──────────────────────────────┐ │ │ │ Playwright (Browser) │ │ │ │ - SearXNG integration │ │ │ │ - Auto browser management │ │ │ │ - Anti-detection │ │ │ └──────────────────────────────┘ │ │ │ └─────────────────────────────────────────┘

Technology Stack

Runtime: Node.js 18+
Language: TypeScript 5.7
Framework: MCP SDK (@modelcontextprotocol/sdk)
Crawling: @just-every/crawl
Browser: Playwright Core
Validation: Zod
Transport: STDIO (local) + HTTP (remote)

Data Flow

Client Request ↓ MCP Protocol ↓ Tool Handler ↓ ┌─────────────────────┐ │ Crawl/Search │ │ @just-every/crawl │ → HTML content │ or SearXNG │ → Search results └─────────────────────┘ ↓ HTML → Markdown ↓ Result Formatting ↓ MCP Response ↓ Client

🧪 Testing

Run Test Suite

# All unit tests npm run test:run # Performance benchmarks npm run test:performance # Full CI suite npm run test:ci # Individual tool test npx @modelcontextprotocol/inspector --cli node dist/index.js \ --method tools/call \ --tool-name crawl_read \ --tool-arg url="https://example.com"

Test Coverage

✅ All 11 tools tested
✅ Error handling validated
✅ Performance benchmarks
✅ Integration workflows
✅ Multi-Node support (Node 18, 20, 22)

CI/CD Pipeline

┌────────────────────────────────────┐ │ GitHub Actions │ ├────────────────────────────────────┤ │ 1. Test (Matrix: Node 18,20,22) │ │ 2. Integration Tests (PR only) │ │ 3. Performance Tests (main) │ │ 4. Security Scan │ │ 5. Coverage Report │ └────────────────────────────────────┘

🔧 Development

Prerequisites

Node.js 18 or higher
npm or yarn

Setup

# Clone the repository git clone https://github.com/Git-Fg/searchcrawl-mcp-server.git cd crawl-mcp-server # Install dependencies npm install # Build TypeScript npm run build # Run in development mode (watch) npm run dev

Development Commands

# Build project npm run build # Watch mode (auto-rebuild) npm run dev # Run tests npm run test:run # Lint code npm run lint # Type check npm run typecheck # Clean build artifacts npm run clean

Project Structure

crawl-mcp-server/ ├── src/ │ ├── index.ts # Main server (11 tools) │ ├── types.ts # TypeScript interfaces │ └── cdp.ts # Chrome CDP manager ├── test/ │ ├── run-tests.ts # Unit test suite │ ├── performance.ts # Performance tests │ └── config.ts # Test configuration ├── dist/ # Compiled JavaScript ├── .github/workflows/ # CI/CD pipeline └── package.json

📊 Performance

Benchmarks

Operation	Avg Duration	Max Memory
crawl_read	~1500ms	32MB
crawl_read_batch (2 URLs)	~2500ms	64MB
search_searx	~4000ms	128MB
crawl_fetch	~2000ms	48MB
tools/list	~100ms	8MB

Performance Features

✅ Concurrent request processing (up to 20)
✅ Built-in caching (SHA-256)
✅ Automatic timeout management
✅ Memory optimization
✅ Resource cleanup

🛡️ Error Handling

All tools include comprehensive error handling:

Network errors: Graceful degradation with error messages
Timeout handling: Configurable timeouts
Partial failures: Batch operations continue on individual failures
Structured errors: Clear error codes and messages
Recovery: Automatic retries where appropriate

Example error response:

{ "content": [ { "type": "text", "text": "Error: Failed to fetch https://example.com: Timeout after 30000ms" } ], "structuredContent": { "error": "Network timeout", "url": "https://example.com", "code": "TIMEOUT" } }

🔐 Security

No API keys required for basic crawling
Respect robots.txt (configurable)
User agent rotation
Rate limiting (built-in via concurrency limits)
Input validation (Zod schemas)
Dependency scanning (npm audit, Snyk)

🌐 Transport Modes

STDIO (Default)

For local MCP clients:

node dist/index.js

HTTP

For remote access:

TRANSPORT=http PORT=3000 node dist/index.js

Server runs on: http://localhost:3000/mcp

📝 Configuration

Environment Variables

# Transport mode (stdio or http) TRANSPORT=stdio # HTTP port (when TRANSPORT=http) PORT=3000 # Node environment NODE_ENV=production

Tool Configuration

Each tool accepts an options object:

{ "timeout": 30000, // Request timeout (ms) "maxConcurrency": 5, // Concurrent requests (1-20) "maxResults": 10, // Limit results (1-50) "respectRobots": false, // Respect robots.txt "sameOriginOnly": true // Only same-origin URLs }

🤝 Contributing

Fork the repository
Create a feature branch: git checkout -b feature/amazing-feature
Make changes and add tests
Run tests: npm run test:ci
Commit: git commit -m 'Add amazing feature'
Push: git push origin feature/amazing-feature
Open a Pull Request

Development Guidelines

Follow TypeScript strict mode
Add tests for new features
Update documentation
Run linting: npm run lint
Ensure CI passes

📄 License

MIT License - see LICENSE file

🙏 Acknowledgments

@just-every/crawl - Web crawling
Model Context Protocol - MCP specification
SearXNG - Search aggregator
Playwright - Browser automation

📞 Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Email: your-email@example.com

🚀 What's Next?

Add DuckDuckGo search support
Implement content filtering
Add screenshot capabilities
Support for authenticated content
PDF extraction
Real-time monitoring

Built with ❤️ using TypeScript, MCP, and modern web technologies.

crawl-mcp-server

✨ Features

📦 Installation

Method 1: npm (Recommended)

Method 2: Direct from Git

Method 3: Clone and Build

Method 4: npx (No Installation)

🔧 Setup for Claude Code

Option 1: MCP Desktop (Recommended)

Option 2: Local Installation

Option 3: Custom Path

🔧 Setup for Other MCP Clients

Claude CLI

Zed Editor

VSCode with Copilot Chat

🚀 Quick Start

Using MCP Inspector (Testing)

Development Mode

📚 Available Tools

Search Tools (7 tools)

1. search_searx

2. launch_chrome_cdp

3. connect_cdp

4. launch_local

5. chrome_status

6. close

7. shutdown_chrome_cdp

Crawling Tools (4 tools)

1. crawl_read ⭐ (Simple & Fast)

2. crawl_read_batch ⭐ (Multiple URLs)

3. crawl_fetch_markdown

4. crawl_fetch

💡 Usage Examples

Example 1: Search + Crawl Workflow

Example 2: Batch Content Extraction

Example 3: Site Crawling

🎯 Tool Selection Guide

🏗️ Architecture

Core Components

Technology Stack

Data Flow

🧪 Testing

Run Test Suite

Test Coverage

CI/CD Pipeline

🔧 Development

Prerequisites

Setup

Development Commands

Project Structure

📊 Performance

Benchmarks

Performance Features

🛡️ Error Handling

🔐 Security

🌐 Transport Modes

STDIO (Default)

HTTP

📝 Configuration

Environment Variables

Tool Configuration

🤝 Contributing

Development Guidelines

📄 License

🙏 Acknowledgments

📞 Support

🚀 What's Next?

Resources

New MCP Servers

Latest Blog Posts

MCP directory API