Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@crawl-mcp-serverextract the content from https://en.wikipedia.org/wiki/Large_language_model"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
crawl-mcp-server
A comprehensive MCP (Model Context Protocol) server providing 11 powerful tools for web crawling and search. Transform web content into clean, LLM-optimized Markdown or search the web with SearXNG integration.
โจ Features
๐ SearXNG Web Search - Search the web with automatic browser management
๐ 4 Crawling Tools - Extract and convert web content to Markdown
๐ Auto-Browser-Launch - Search tools automatically manage browser lifecycle
๐ฆ 11 Total Tools - Complete toolkit for web interaction
๐พ Built-in Caching - SHA-256 based caching with graceful fallbacks
โก Concurrent Processing - Handle multiple URLs simultaneously (up to 50)
๐ฏ LLM-Optimized Output - Clean Markdown perfect for AI consumption
๐ก๏ธ Robust Error Handling - Graceful failure with detailed error messages
๐งช Comprehensive Testing - Full CI/CD with performance benchmarks
๐ฆ Installation
Method 1: npm (Recommended)
npm install crawl-mcp-serverMethod 2: Direct from Git
# Install latest from GitHub
npm install git+https://github.com/Git-Fg/searchcrawl-mcp-server.git
# Or specific branch
npm install git+https://github.com/Git-Fg/searchcrawl-mcp-server.git#main
# Or from a fork
npm install git+https://github.com/YOUR_FORK/searchcrawl-mcp-server.gitMethod 3: Clone and Build
git clone https://github.com/Git-Fg/searchcrawl-mcp-server.git
cd crawl-mcp-server
npm install
npm run buildMethod 4: npx (No Installation)
# Run directly without installing
npx git+https://github.com/Git-Fg/searchcrawl-mcp-server.git๐ง Setup for Claude Code
Option 1: MCP Desktop (Recommended)
Add to your Claude Desktop configuration file:
** macOS/Linux: ~/.config/claude/claude_desktop_config.json**
{
"mcpServers": {
"crawl-server": {
"command": "npx",
"args": [
"git+https://github.com/Git-Fg/searchcrawl-mcp-server.git"
],
"env": {
"NODE_ENV": "production"
}
}
}
}** Windows: %APPDATA%\Claude\claude_desktop_config.json**
{
"mcpServers": {
"crawl-server": {
"command": "npx",
"args": [
"git+https://github.com/Git-Fg/searchcrawl-mcp-server.git"
],
"env": {
"NODE_ENV": "production"
}
}
}
}Option 2: Local Installation
If you've installed locally:
{
"mcpServers": {
"crawl-server": {
"command": "node",
"args": [
"/path/to/crawl-mcp-server/dist/index.js"
],
"env": {}
}
}
}Option 3: Custom Path
For a specific installation:
{
"mcpServers": {
"crawl-server": {
"command": "node",
"args": [
"/usr/local/lib/node_modules/crawl-mcp-server/dist/index.js"
],
"env": {}
}
}
}After configuration, restart Claude Desktop.
๐ง Setup for Other MCP Clients
Claude CLI
# Using npx
claude mcp add crawl-server npx git+https://github.com/Git-Fg/searchcrawl-mcp-server.git
# Using local installation
claude mcp add crawl-server node /path/to/crawl-mcp-server/dist/index.jsZed Editor
Add to ~/.config/zed/settings.json:
{
"assistant": {
"mcp": {
"servers": {
"crawl-server": {
"command": "node",
"args": ["/path/to/crawl-mcp-server/dist/index.js"]
}
}
}
}
}VSCode with Copilot Chat
{
"mcpServers": {
"crawl-server": {
"command": "node",
"args": ["/path/to/crawl-mcp-server/dist/index.js"]
}
}
}๐ Quick Start
Using MCP Inspector (Testing)
# Install MCP Inspector globally
npm install -g @modelcontextprotocol/inspector
# Run the server
node dist/index.js
# In another terminal, test tools
npx @modelcontextprotocol/inspector --cli node dist/index.js --method tools/listDevelopment Mode
# Watch mode (auto-rebuild on changes)
npm run dev
# Build TypeScript
npm run build
# Run tests
npm run test:run๐ Available Tools
Search Tools (7 tools)
1. search_searx
Search the web using SearXNG with automatic browser management.
// Example call
{
"query": "TypeScript MCP server",
"maxResults": 10,
"category": "general",
"timeRange": "week",
"language": "en"
}Parameters:
query(string, required): Search querymaxResults(number, default: 20): Results to return (1-50)category(enum, default: general): one of general, images, videos, news, map, music, it, sciencetimeRange(enum, optional): one of day, week, month, yearlanguage(string, default: en): Language code
Returns: JSON with search results array, URLs, and metadata
2. launch_chrome_cdp
Launch system Chrome with remote debugging for advanced SearX usage.
{
"headless": true,
"port": 9222,
"userDataDir": "/path/to/profile"
}Parameters:
headless(boolean, default: true): Run Chrome headlessport(number, default: 9222): Remote debugging portuserDataDir(string, optional): Custom Chrome profile
3. connect_cdp
Connect to remote CDP browser (Browserbase, etc.).
{
"cdpWsUrl": "http://localhost:9222"
}Parameters:
cdpWsUrl(string, required): CDP WebSocket URL or HTTP endpoint
4. launch_local
Launch bundled Chromium for SearX search.
{
"headless": true,
"userAgent": "custom user agent string"
}Parameters:
headless(boolean, default: true): Run headlessuserAgent(string, optional): Custom user agent
5. chrome_status
Check Chrome CDP status and health.
{}Returns: Running status, health, endpoint URL, and PID
6. close
Close browser session (keeps Chrome CDP running).
{}7. shutdown_chrome_cdp
Shutdown Chrome CDP and cleanup resources.
{}Crawling Tools (4 tools)
1. crawl_read โญ (Simple & Fast)
Quick single-page extraction to Markdown.
{
"url": "https://example.com/article",
"options": {
"timeout": 30000
}
}Best for:
โ News articles
โ Blog posts
โ Documentation pages
โ Simple content extraction
Returns: Clean Markdown content
2. crawl_read_batch โญ (Multiple URLs)
Process 1-50 URLs concurrently.
{
"urls": [
"https://example.com/article1",
"https://example.com/article2",
"https://example.com/article3"
],
"options": {
"maxConcurrency": 5,
"timeout": 30000,
"maxResults": 10
}
}Best for:
โ Processing multiple articles
โ Building content aggregates
โ Bulk content extraction
Returns: Array of Markdown results with summary statistics
3. crawl_fetch_markdown
Controlled single-page extraction with full option control.
{
"url": "https://example.com/article",
"options": {
"timeout": 30000
}
}Best for:
โ Advanced crawling options
โ Custom timeout control
โ Detailed extraction
4. crawl_fetch
Multi-page crawling with intelligent link extraction.
{
"url": "https://example.com",
"options": {
"pages": 5,
"maxConcurrency": 3,
"sameOriginOnly": true,
"timeout": 30000,
"maxResults": 20
}
}Best for:
โ Crawling entire sites
โ Link-based discovery
โ Multi-page scraping
Features:
Extracts links from starting page
Crawls discovered pages
Concurrent processing
Same-origin filtering (configurable)
๐ก Usage Examples
Example 1: Search + Crawl Workflow
// Step 1: Search for topics
{
"tool": "search_searx",
"arguments": {
"query": "TypeScript best practices 2024",
"maxResults": 5
}
}
// Step 2: Extract URLs from results
// (Parse the search results to get URLs)
// Step 3: Crawl selected articles
{
"tool": "crawl_read_batch",
"arguments": {
"urls": [
"https://example.com/article1",
"https://example.com/article2",
"https://example.com/article3"
]
}
}Example 2: Batch Content Extraction
{
"tool": "crawl_read_batch",
"arguments": {
"urls": [
"https://news.site/article1",
"https://news.site/article2",
"https://news.site/article3"
],
"options": {
"maxConcurrency": 10,
"timeout": 30000,
"maxResults": 3
}
}
}Example 3: Site Crawling
{
"tool": "crawl_fetch",
"arguments": {
"url": "https://docs.example.com",
"options": {
"pages": 10,
"maxConcurrency": 5,
"sameOriginOnly": true,
"timeout": 30000,
"maxResults": 10
}
}
}๐ฏ Tool Selection Guide
Use Case | Recommended Tool | Complexity |
Single article |
| Simple |
Multiple articles |
| Simple |
Advanced options |
| Medium |
Site crawling |
| Complex |
Web search |
| Simple |
Research workflow |
| Medium |
๐๏ธ Architecture
Core Components
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ crawl-mcp-server โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ MCP Server Core โ โ
โ โ - 11 registered tools โ โ
โ โ - STDIO/HTTP transport โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ @just-every/crawl โ โ
โ โ - HTML โ Markdown โ โ
โ โ - Mozilla Readability โ โ
โ โ - Concurrent crawling โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Playwright (Browser) โ โ
โ โ - SearXNG integration โ โ
โ โ - Auto browser management โ โ
โ โ - Anti-detection โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโTechnology Stack
Runtime: Node.js 18+
Language: TypeScript 5.7
Framework: MCP SDK (@modelcontextprotocol/sdk)
Crawling: @just-every/crawl
Browser: Playwright Core
Validation: Zod
Transport: STDIO (local) + HTTP (remote)
Data Flow
Client Request
โ
MCP Protocol
โ
Tool Handler
โ
โโโโโโโโโโโโโโโโโโโโโโโ
โ Crawl/Search โ
โ @just-every/crawl โ โ HTML content
โ or SearXNG โ โ Search results
โโโโโโโโโโโโโโโโโโโโโโโ
โ
HTML โ Markdown
โ
Result Formatting
โ
MCP Response
โ
Client๐งช Testing
Run Test Suite
# All unit tests
npm run test:run
# Performance benchmarks
npm run test:performance
# Full CI suite
npm run test:ci
# Individual tool test
npx @modelcontextprotocol/inspector --cli node dist/index.js \
--method tools/call \
--tool-name crawl_read \
--tool-arg url="https://example.com"Test Coverage
โ All 11 tools tested
โ Error handling validated
โ Performance benchmarks
โ Integration workflows
โ Multi-Node support (Node 18, 20, 22)
CI/CD Pipeline
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ GitHub Actions โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ 1. Test (Matrix: Node 18,20,22) โ
โ 2. Integration Tests (PR only) โ
โ 3. Performance Tests (main) โ
โ 4. Security Scan โ
โ 5. Coverage Report โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ๐ง Development
Prerequisites
Node.js 18 or higher
npm or yarn
Setup
# Clone the repository
git clone https://github.com/Git-Fg/searchcrawl-mcp-server.git
cd crawl-mcp-server
# Install dependencies
npm install
# Build TypeScript
npm run build
# Run in development mode (watch)
npm run devDevelopment Commands
# Build project
npm run build
# Watch mode (auto-rebuild)
npm run dev
# Run tests
npm run test:run
# Lint code
npm run lint
# Type check
npm run typecheck
# Clean build artifacts
npm run cleanProject Structure
crawl-mcp-server/
โโโ src/
โ โโโ index.ts # Main server (11 tools)
โ โโโ types.ts # TypeScript interfaces
โ โโโ cdp.ts # Chrome CDP manager
โโโ test/
โ โโโ run-tests.ts # Unit test suite
โ โโโ performance.ts # Performance tests
โ โโโ config.ts # Test configuration
โโโ dist/ # Compiled JavaScript
โโโ .github/workflows/ # CI/CD pipeline
โโโ package.json๐ Performance
Benchmarks
Operation | Avg Duration | Max Memory |
crawl_read | ~1500ms | 32MB |
crawl_read_batch (2 URLs) | ~2500ms | 64MB |
search_searx | ~4000ms | 128MB |
crawl_fetch | ~2000ms | 48MB |
tools/list | ~100ms | 8MB |
Performance Features
โ Concurrent request processing (up to 20)
โ Built-in caching (SHA-256)
โ Automatic timeout management
โ Memory optimization
โ Resource cleanup
๐ก๏ธ Error Handling
All tools include comprehensive error handling:
Network errors: Graceful degradation with error messages
Timeout handling: Configurable timeouts
Partial failures: Batch operations continue on individual failures
Structured errors: Clear error codes and messages
Recovery: Automatic retries where appropriate
Example error response:
{
"content": [
{
"type": "text",
"text": "Error: Failed to fetch https://example.com: Timeout after 30000ms"
}
],
"structuredContent": {
"error": "Network timeout",
"url": "https://example.com",
"code": "TIMEOUT"
}
}๐ Security
No API keys required for basic crawling
Respect robots.txt (configurable)
User agent rotation
Rate limiting (built-in via concurrency limits)
Input validation (Zod schemas)
Dependency scanning (npm audit, Snyk)
๐ Transport Modes
STDIO (Default)
For local MCP clients:
node dist/index.jsHTTP
For remote access:
TRANSPORT=http PORT=3000 node dist/index.jsServer runs on: http://localhost:3000/mcp
๐ Configuration
Environment Variables
# Transport mode (stdio or http)
TRANSPORT=stdio
# HTTP port (when TRANSPORT=http)
PORT=3000
# Node environment
NODE_ENV=productionTool Configuration
Each tool accepts an options object:
{
"timeout": 30000, // Request timeout (ms)
"maxConcurrency": 5, // Concurrent requests (1-20)
"maxResults": 10, // Limit results (1-50)
"respectRobots": false, // Respect robots.txt
"sameOriginOnly": true // Only same-origin URLs
}๐ค Contributing
Fork the repository
Create a feature branch:
git checkout -b feature/amazing-featureMake changes and add tests
Run tests:
npm run test:ciCommit:
git commit -m 'Add amazing feature'Push:
git push origin feature/amazing-featureOpen a Pull Request
Development Guidelines
Follow TypeScript strict mode
Add tests for new features
Update documentation
Run linting:
npm run lintEnsure CI passes
๐ License
MIT License - see LICENSE file
๐ Acknowledgments
@just-every/crawl - Web crawling
Model Context Protocol - MCP specification
SearXNG - Search aggregator
Playwright - Browser automation
๐ Support
Issues: GitHub Issues
Discussions: GitHub Discussions
Email: your-email@example.com
๐ What's Next?
Add DuckDuckGo search support
Implement content filtering
Add screenshot capabilities
Support for authenticated content
PDF extraction
Real-time monitoring
Built with โค๏ธ using TypeScript, MCP, and modern web technologies.
This server cannot be installed
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.