Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@crawl-mcp-serverextract the content from https://en.wikipedia.org/wiki/Large_language_model"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
crawl-mcp-server
A comprehensive MCP (Model Context Protocol) server providing 11 powerful tools for web crawling and search. Transform web content into clean, LLM-optimized Markdown or search the web with SearXNG integration.
โจ Features
๐ SearXNG Web Search - Search the web with automatic browser management
๐ 4 Crawling Tools - Extract and convert web content to Markdown
๐ Auto-Browser-Launch - Search tools automatically manage browser lifecycle
๐ฆ 11 Total Tools - Complete toolkit for web interaction
๐พ Built-in Caching - SHA-256 based caching with graceful fallbacks
โก Concurrent Processing - Handle multiple URLs simultaneously (up to 50)
๐ฏ LLM-Optimized Output - Clean Markdown perfect for AI consumption
๐ก๏ธ Robust Error Handling - Graceful failure with detailed error messages
๐งช Comprehensive Testing - Full CI/CD with performance benchmarks
๐ฆ Installation
Method 1: npm (Recommended)
npm install crawl-mcp-serverMethod 2: Direct from Git
# Install latest from GitHub
npm install git+https://github.com/Git-Fg/searchcrawl-mcp-server.git
# Or specific branch
npm install git+https://github.com/Git-Fg/searchcrawl-mcp-server.git#main
# Or from a fork
npm install git+https://github.com/YOUR_FORK/searchcrawl-mcp-server.gitMethod 3: Clone and Build
git clone https://github.com/Git-Fg/searchcrawl-mcp-server.git
cd crawl-mcp-server
npm install
npm run buildMethod 4: npx (No Installation)
# Run directly without installing
npx git+https://github.com/Git-Fg/searchcrawl-mcp-server.git๐ง Setup for Claude Code
Option 1: MCP Desktop (Recommended)
Add to your Claude Desktop configuration file:
** macOS/Linux: ~/.config/claude/claude_desktop_config.json**
{
"mcpServers": {
"crawl-server": {
"command": "npx",
"args": [
"git+https://github.com/Git-Fg/searchcrawl-mcp-server.git"
],
"env": {
"NODE_ENV": "production"
}
}
}
}** Windows: %APPDATA%\Claude\claude_desktop_config.json**
{
"mcpServers": {
"crawl-server": {
"command": "npx",
"args": [
"git+https://github.com/Git-Fg/searchcrawl-mcp-server.git"
],
"env": {
"NODE_ENV": "production"
}
}
}
}Option 2: Local Installation
If you've installed locally:
{
"mcpServers": {
"crawl-server": {
"command": "node",
"args": [
"/path/to/crawl-mcp-server/dist/index.js"
],
"env": {}
}
}
}Option 3: Custom Path
For a specific installation:
{
"mcpServers": {
"crawl-server": {
"command": "node",
"args": [
"/usr/local/lib/node_modules/crawl-mcp-server/dist/index.js"
],
"env": {}
}
}
}After configuration, restart Claude Desktop.
๐ง Setup for Other MCP Clients
Claude CLI
# Using npx
claude mcp add crawl-server npx git+https://github.com/Git-Fg/searchcrawl-mcp-server.git
# Using local installation
claude mcp add crawl-server node /path/to/crawl-mcp-server/dist/index.jsZed Editor
Add to ~/.config/zed/settings.json:
{
"assistant": {
"mcp": {
"servers": {
"crawl-server": {
"command": "node",
"args": ["/path/to/crawl-mcp-server/dist/index.js"]
}
}
}
}
}VSCode with Copilot Chat
{
"mcpServers": {
"crawl-server": {
"command": "node",
"args": ["/path/to/crawl-mcp-server/dist/index.js"]
}
}
}๐ Quick Start
Using MCP Inspector (Testing)
# Install MCP Inspector globally
npm install -g @modelcontextprotocol/inspector
# Run the server
node dist/index.js
# In another terminal, test tools
npx @modelcontextprotocol/inspector --cli node dist/index.js --method tools/listDevelopment Mode
# Watch mode (auto-rebuild on changes)
npm run dev
# Build TypeScript
npm run build
# Run tests
npm run test:run๐ Available Tools
Search Tools (7 tools)
1. search_searx
Search the web using SearXNG with automatic browser management.
// Example call
{
"query": "TypeScript MCP server",
"maxResults": 10,
"category": "general",
"timeRange": "week",
"language": "en"
}Parameters:
query(string, required): Search querymaxResults(number, default: 20): Results to return (1-50)category(enum, default: general): one of general, images, videos, news, map, music, it, sciencetimeRange(enum, optional): one of day, week, month, yearlanguage(string, default: en): Language code
Returns: JSON with search results array, URLs, and metadata
2. launch_chrome_cdp
Launch system Chrome with remote debugging for advanced SearX usage.
{
"headless": true,
"port": 9222,
"userDataDir": "/path/to/profile"
}Parameters:
headless(boolean, default: true): Run Chrome headlessport(number, default: 9222): Remote debugging portuserDataDir(string, optional): Custom Chrome profile
3. connect_cdp
Connect to remote CDP browser (Browserbase, etc.).
{
"cdpWsUrl": "http://localhost:9222"
}Parameters:
cdpWsUrl(string, required): CDP WebSocket URL or HTTP endpoint
4. launch_local
Launch bundled Chromium for SearX search.
{
"headless": true,
"userAgent": "custom user agent string"
}Parameters:
headless(boolean, default: true): Run headlessuserAgent(string, optional): Custom user agent
5. chrome_status
Check Chrome CDP status and health.
{}Returns: Running status, health, endpoint URL, and PID
6. close
Close browser session (keeps Chrome CDP running).
{}7. shutdown_chrome_cdp
Shutdown Chrome CDP and cleanup resources.
{}Crawling Tools (4 tools)
1. crawl_read โญ (Simple & Fast)
Quick single-page extraction to Markdown.
{
"url": "https://example.com/article",
"options": {
"timeout": 30000
}
}Best for:
โ News articles
โ Blog posts
โ Documentation pages
โ Simple content extraction
Returns: Clean Markdown content
2. crawl_read_batch โญ (Multiple URLs)
Process 1-50 URLs concurrently.
{
"urls": [
"https://example.com/article1",
"https://example.com/article2",
"https://example.com/article3"
],
"options": {
"maxConcurrency": 5,
"timeout": 30000,
"maxResults": 10
}
}Best for:
โ Processing multiple articles
โ Building content aggregates
โ Bulk content extraction
Returns: Array of Markdown results with summary statistics
3. crawl_fetch_markdown
Controlled single-page extraction with full option control.
{
"url": "https://example.com/article",
"options": {
"timeout": 30000
}
}Best for:
โ Advanced crawling options
โ Custom timeout control
โ Detailed extraction
4. crawl_fetch
Multi-page crawling with intelligent link extraction.
{
"url": "https://example.com",
"options": {
"pages": 5,
"maxConcurrency": 3,
"sameOriginOnly": true,
"timeout": 30000,
"maxResults": 20
}
}Best for:
โ Crawling entire sites
โ Link-based discovery
โ Multi-page scraping
Features:
Extracts links from starting page
Crawls discovered pages
Concurrent processing
Same-origin filtering (configurable)
๐ก Usage Examples
Example 1: Search + Crawl Workflow
// Step 1: Search for topics
{
"tool": "search_searx",
"arguments": {
"query": "TypeScript best practices 2024",
"maxResults": 5
}
}
// Step 2: Extract URLs from results
// (Parse the search results to get URLs)
// Step 3: Crawl selected articles
{
"tool": "crawl_read_batch",
"arguments": {
"urls": [
"https://example.com/article1",
"https://example.com/article2",
"https://example.com/article3"
]
}
}Example 2: Batch Content Extraction
{
"tool": "crawl_read_batch",
"arguments": {
"urls": [
"https://news.site/article1",
"https://news.site/article2",
"https://news.site/article3"
],
"options": {
"maxConcurrency": 10,
"timeout": 30000,
"maxResults": 3
}
}
}Example 3: Site Crawling
{
"tool": "crawl_fetch",
"arguments": {
"url": "https://docs.example.com",
"options": {
"pages": 10,
"maxConcurrency": 5,
"sameOriginOnly": true,
"timeout": 30000,
"maxResults": 10
}
}
}๐ฏ Tool Selection Guide
Use Case | Recommended Tool | Complexity |
Single article |
| Simple |
Multiple articles |
| Simple |
Advanced options |
| Medium |
Site crawling |
| Complex |
Web search |
| Simple |
Research workflow |
| Medium |
๐๏ธ Architecture
Core Components
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ crawl-mcp-server โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ MCP Server Core โ โ
โ โ - 11 registered tools โ โ
โ โ - STDIO/HTTP transport โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ @just-every/crawl โ โ
โ โ - HTML โ Markdown โ โ
โ โ - Mozilla Readability โ โ
โ โ - Concurrent crawling โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Playwright (Browser) โ โ
โ โ - SearXNG integration โ โ
โ โ - Auto browser management โ โ
โ โ - Anti-detection โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโTechnology Stack
Runtime: Node.js 18+
Language: TypeScript 5.7
Framework: MCP SDK (@modelcontextprotocol/sdk)
Crawling: @just-every/crawl
Browser: Playwright Core
Validation: Zod
Transport: STDIO (local) + HTTP (remote)
Data Flow
Client Request
โ
MCP Protocol
โ
Tool Handler
โ
โโโโโโโโโโโโโโโโโโโโโโโ
โ Crawl/Search โ
โ @just-every/crawl โ โ HTML content
โ or SearXNG โ โ Search results
โโโโโโโโโโโโโโโโโโโโโโโ
โ
HTML โ Markdown
โ
Result Formatting
โ
MCP Response
โ
Client๐งช Testing
Run Test Suite
# All unit tests
npm run test:run
# Performance benchmarks
npm run test:performance
# Full CI suite
npm run test:ci
# Individual tool test
npx @modelcontextprotocol/inspector --cli node dist/index.js \
--method tools/call \
--tool-name crawl_read \
--tool-arg url="https://example.com"Test Coverage
โ All 11 tools tested
โ Error handling validated
โ Performance benchmarks
โ Integration workflows
โ Multi-Node support (Node 18, 20, 22)
CI/CD Pipeline
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ GitHub Actions โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ 1. Test (Matrix: Node 18,20,22) โ
โ 2. Integration Tests (PR only) โ
โ 3. Performance Tests (main) โ
โ 4. Security Scan โ
โ 5. Coverage Report โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ๐ง Development
Prerequisites
Node.js 18 or higher
npm or yarn
Setup
# Clone the repository
git clone https://github.com/Git-Fg/searchcrawl-mcp-server.git
cd crawl-mcp-server
# Install dependencies
npm install
# Build TypeScript
npm run build
# Run in development mode (watch)
npm run devDevelopment Commands
# Build project
npm run build
# Watch mode (auto-rebuild)
npm run dev
# Run tests
npm run test:run
# Lint code
npm run lint
# Type check
npm run typecheck
# Clean build artifacts
npm run cleanProject Structure
crawl-mcp-server/
โโโ src/
โ โโโ index.ts # Main server (11 tools)
โ โโโ types.ts # TypeScript interfaces
โ โโโ cdp.ts # Chrome CDP manager
โโโ test/
โ โโโ run-tests.ts # Unit test suite
โ โโโ performance.ts # Performance tests
โ โโโ config.ts # Test configuration
โโโ dist/ # Compiled JavaScript
โโโ .github/workflows/ # CI/CD pipeline
โโโ package.json๐ Performance
Benchmarks
Operation | Avg Duration | Max Memory |
crawl_read | ~1500ms | 32MB |
crawl_read_batch (2 URLs) | ~2500ms | 64MB |
search_searx | ~4000ms | 128MB |
crawl_fetch | ~2000ms | 48MB |
tools/list | ~100ms | 8MB |
Performance Features
โ Concurrent request processing (up to 20)
โ Built-in caching (SHA-256)
โ Automatic timeout management
โ Memory optimization
โ Resource cleanup
๐ก๏ธ Error Handling
All tools include comprehensive error handling:
Network errors: Graceful degradation with error messages
Timeout handling: Configurable timeouts
Partial failures: Batch operations continue on individual failures
Structured errors: Clear error codes and messages
Recovery: Automatic retries where appropriate
Example error response:
{
"content": [
{
"type": "text",
"text": "Error: Failed to fetch https://example.com: Timeout after 30000ms"
}
],
"structuredContent": {
"error": "Network timeout",
"url": "https://example.com",
"code": "TIMEOUT"
}
}๐ Security
No API keys required for basic crawling
Respect robots.txt (configurable)
User agent rotation
Rate limiting (built-in via concurrency limits)
Input validation (Zod schemas)
Dependency scanning (npm audit, Snyk)
๐ Transport Modes
STDIO (Default)
For local MCP clients:
node dist/index.jsHTTP
For remote access:
TRANSPORT=http PORT=3000 node dist/index.jsServer runs on: http://localhost:3000/mcp
๐ Configuration
Environment Variables
# Transport mode (stdio or http)
TRANSPORT=stdio
# HTTP port (when TRANSPORT=http)
PORT=3000
# Node environment
NODE_ENV=productionTool Configuration
Each tool accepts an options object:
{
"timeout": 30000, // Request timeout (ms)
"maxConcurrency": 5, // Concurrent requests (1-20)
"maxResults": 10, // Limit results (1-50)
"respectRobots": false, // Respect robots.txt
"sameOriginOnly": true // Only same-origin URLs
}๐ค Contributing
Fork the repository
Create a feature branch:
git checkout -b feature/amazing-featureMake changes and add tests
Run tests:
npm run test:ciCommit:
git commit -m 'Add amazing feature'Push:
git push origin feature/amazing-featureOpen a Pull Request
Development Guidelines
Follow TypeScript strict mode
Add tests for new features
Update documentation
Run linting:
npm run lintEnsure CI passes
๐ License
MIT License - see LICENSE file
๐ Acknowledgments
@just-every/crawl - Web crawling
Model Context Protocol - MCP specification
SearXNG - Search aggregator
Playwright - Browser automation
๐ Support
Issues: GitHub Issues
Discussions: GitHub Discussions
Email: your-email@example.com
๐ What's Next?
Add DuckDuckGo search support
Implement content filtering
Add screenshot capabilities
Support for authenticated content
PDF extraction
Real-time monitoring
Built with โค๏ธ using TypeScript, MCP, and modern web technologies.