Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@crawl-mcp-serverextract the content from https://en.wikipedia.org/wiki/Large_language_model"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
crawl-mcp-server
A comprehensive MCP (Model Context Protocol) server providing 11 powerful tools for web crawling and search. Transform web content into clean, LLM-optimized Markdown or search the web with SearXNG integration.
โจ Features
๐ SearXNG Web Search - Search the web with automatic browser management
๐ 4 Crawling Tools - Extract and convert web content to Markdown
๐ Auto-Browser-Launch - Search tools automatically manage browser lifecycle
๐ฆ 11 Total Tools - Complete toolkit for web interaction
๐พ Built-in Caching - SHA-256 based caching with graceful fallbacks
โก Concurrent Processing - Handle multiple URLs simultaneously (up to 50)
๐ฏ LLM-Optimized Output - Clean Markdown perfect for AI consumption
๐ก๏ธ Robust Error Handling - Graceful failure with detailed error messages
๐งช Comprehensive Testing - Full CI/CD with performance benchmarks
๐ฆ Installation
Method 1: npm (Recommended)
Method 2: Direct from Git
Method 3: Clone and Build
Method 4: npx (No Installation)
๐ง Setup for Claude Code
Option 1: MCP Desktop (Recommended)
Add to your Claude Desktop configuration file:
** macOS/Linux: ~/.config/claude/claude_desktop_config.json**
** Windows: %APPDATA%\Claude\claude_desktop_config.json**
Option 2: Local Installation
If you've installed locally:
Option 3: Custom Path
For a specific installation:
After configuration, restart Claude Desktop.
๐ง Setup for Other MCP Clients
Claude CLI
Zed Editor
Add to ~/.config/zed/settings.json:
VSCode with Copilot Chat
๐ Quick Start
Using MCP Inspector (Testing)
Development Mode
๐ Available Tools
Search Tools (7 tools)
1. search_searx
Search the web using SearXNG with automatic browser management.
Parameters:
query(string, required): Search querymaxResults(number, default: 20): Results to return (1-50)category(enum, default: general): one of general, images, videos, news, map, music, it, sciencetimeRange(enum, optional): one of day, week, month, yearlanguage(string, default: en): Language code
Returns: JSON with search results array, URLs, and metadata
2. launch_chrome_cdp
Launch system Chrome with remote debugging for advanced SearX usage.
Parameters:
headless(boolean, default: true): Run Chrome headlessport(number, default: 9222): Remote debugging portuserDataDir(string, optional): Custom Chrome profile
3. connect_cdp
Connect to remote CDP browser (Browserbase, etc.).
Parameters:
cdpWsUrl(string, required): CDP WebSocket URL or HTTP endpoint
4. launch_local
Launch bundled Chromium for SearX search.
Parameters:
headless(boolean, default: true): Run headlessuserAgent(string, optional): Custom user agent
5. chrome_status
Check Chrome CDP status and health.
Returns: Running status, health, endpoint URL, and PID
6. close
Close browser session (keeps Chrome CDP running).
7. shutdown_chrome_cdp
Shutdown Chrome CDP and cleanup resources.
Crawling Tools (4 tools)
1. crawl_read โญ (Simple & Fast)
Quick single-page extraction to Markdown.
Best for:
โ News articles
โ Blog posts
โ Documentation pages
โ Simple content extraction
Returns: Clean Markdown content
2. crawl_read_batch โญ (Multiple URLs)
Process 1-50 URLs concurrently.
Best for:
โ Processing multiple articles
โ Building content aggregates
โ Bulk content extraction
Returns: Array of Markdown results with summary statistics
3. crawl_fetch_markdown
Controlled single-page extraction with full option control.
Best for:
โ Advanced crawling options
โ Custom timeout control
โ Detailed extraction
4. crawl_fetch
Multi-page crawling with intelligent link extraction.
Best for:
โ Crawling entire sites
โ Link-based discovery
โ Multi-page scraping
Features:
Extracts links from starting page
Crawls discovered pages
Concurrent processing
Same-origin filtering (configurable)
๐ก Usage Examples
Example 1: Search + Crawl Workflow
Example 2: Batch Content Extraction
Example 3: Site Crawling
๐ฏ Tool Selection Guide
Use Case | Recommended Tool | Complexity |
Single article |
| Simple |
Multiple articles |
| Simple |
Advanced options |
| Medium |
Site crawling |
| Complex |
Web search |
| Simple |
Research workflow |
| Medium |
๐๏ธ Architecture
Core Components
Technology Stack
Runtime: Node.js 18+
Language: TypeScript 5.7
Framework: MCP SDK (@modelcontextprotocol/sdk)
Crawling: @just-every/crawl
Browser: Playwright Core
Validation: Zod
Transport: STDIO (local) + HTTP (remote)
Data Flow
๐งช Testing
Run Test Suite
Test Coverage
โ All 11 tools tested
โ Error handling validated
โ Performance benchmarks
โ Integration workflows
โ Multi-Node support (Node 18, 20, 22)
CI/CD Pipeline
๐ง Development
Prerequisites
Node.js 18 or higher
npm or yarn
Setup
Development Commands
Project Structure
๐ Performance
Benchmarks
Operation | Avg Duration | Max Memory |
crawl_read | ~1500ms | 32MB |
crawl_read_batch (2 URLs) | ~2500ms | 64MB |
search_searx | ~4000ms | 128MB |
crawl_fetch | ~2000ms | 48MB |
tools/list | ~100ms | 8MB |
Performance Features
โ Concurrent request processing (up to 20)
โ Built-in caching (SHA-256)
โ Automatic timeout management
โ Memory optimization
โ Resource cleanup
๐ก๏ธ Error Handling
All tools include comprehensive error handling:
Network errors: Graceful degradation with error messages
Timeout handling: Configurable timeouts
Partial failures: Batch operations continue on individual failures
Structured errors: Clear error codes and messages
Recovery: Automatic retries where appropriate
Example error response:
๐ Security
No API keys required for basic crawling
Respect robots.txt (configurable)
User agent rotation
Rate limiting (built-in via concurrency limits)
Input validation (Zod schemas)
Dependency scanning (npm audit, Snyk)
๐ Transport Modes
STDIO (Default)
For local MCP clients:
HTTP
For remote access:
Server runs on: http://localhost:3000/mcp
๐ Configuration
Environment Variables
Tool Configuration
Each tool accepts an options object:
๐ค Contributing
Fork the repository
Create a feature branch:
git checkout -b feature/amazing-featureMake changes and add tests
Run tests:
npm run test:ciCommit:
git commit -m 'Add amazing feature'Push:
git push origin feature/amazing-featureOpen a Pull Request
Development Guidelines
Follow TypeScript strict mode
Add tests for new features
Update documentation
Run linting:
npm run lintEnsure CI passes
๐ License
MIT License - see LICENSE file
๐ Acknowledgments
@just-every/crawl - Web crawling
Model Context Protocol - MCP specification
SearXNG - Search aggregator
Playwright - Browser automation
๐ Support
Issues: GitHub Issues
Discussions: GitHub Discussions
Email: your-email@example.com
๐ What's Next?
Add DuckDuckGo search support
Implement content filtering
Add screenshot capabilities
Support for authenticated content
PDF extraction
Real-time monitoring
Built with โค๏ธ using TypeScript, MCP, and modern web technologies.