# π The Most Advanced Web Fetching MCP Server
[](https://www.npmjs.com/package/@llmbase/mcp-web-fetch)
[](https://opensource.org/licenses/MIT)
[](https://www.typescriptlang.org/)
[](https://workers.cloudflare.com/)
[](https://modelcontextprotocol.io/)
> **π The most feature-rich, production-ready web fetching MCP server available**
Transform Claude into a powerful web scraping and content analysis tool with our **enterprise-grade MCP server collection**. Built with modern tech stack and battle-tested in production.
## π Setup in Your IDE (30 seconds)
<details open>
<summary><strong>π― Claude Code / Claude Desktop</strong></summary>
### Option 1: Hosted Service (Recommended)
**Zero setup - copy this config:**
```json
{
"mcpServers": {
"web-fetcher": {
"command": "npx",
"args": [
"workers-mcp",
"run",
"web-fetcher",
"https://mcp.llmbase.ai/mcp/web-fetch"
]
}
}
}
```
### Option 2: Local Installation
**Maximum privacy - runs on your machine:**
```bash
npm install @llmbase/mcp-web-fetch
```
**Claude Desktop config:**
```json
{
"mcpServers": {
"web-fetcher": {
"command": "npx",
"args": ["@llmbase/mcp-web-fetch"]
}
}
}
```
**Config file locations:**
- **macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`
- **Windows**: `%APPDATA%/Claude/claude_desktop_config.json`
</details>
<details>
<summary><strong>π§ Cursor IDE</strong></summary>
### Install the MCP Extension
1. Open Cursor IDE
2. Go to Extensions (Ctrl+Shift+X)
3. Search for "MCP" or "Model Context Protocol"
4. Install the MCP extension
### Configure Web Fetcher
1. Open Command Palette (Ctrl+Shift+P)
2. Run "MCP: Configure Server"
3. Add server configuration:
```json
{
"web-fetcher": {
"command": "npx",
"args": ["@llmbase/mcp-web-fetch"]
}
}
```
### Alternative: Direct Integration
Add to your `.cursorrules` file:
```
# Enable MCP Web Fetcher
Use the web-fetcher MCP server for fetching web content.
Server endpoint: npx @llmbase/mcp-web-fetch
```
</details>
<details>
<summary><strong>π Windsurf IDE</strong></summary>
### Setup MCP Integration
1. Open Windsurf settings
2. Navigate to "Extensions" β "MCP Servers"
3. Click "Add Server"
4. Configure:
**Server Name:** `web-fetcher`
**Command:** `npx`
**Arguments:** `@llmbase/mcp-web-fetch`
### Alternative Configuration
Create `.windsurf/mcp.json`:
```json
{
"servers": {
"web-fetcher": {
"command": "npx",
"args": ["@llmbase/mcp-web-fetch"],
"description": "Advanced web content fetching and processing"
}
}
}
```
</details>
<details>
<summary><strong>π» VS Code</strong></summary>
### Using Continue Extension
1. Install the Continue extension from VS Code marketplace
2. Open Continue settings (Ctrl+,)
3. Add to `config.json`:
```json
{
"mcpServers": {
"web-fetcher": {
"command": "npx",
"args": ["@llmbase/mcp-web-fetch"]
}
}
}
```
### Using Cline Extension
1. Install Cline extension
2. Configure MCP server in settings:
```json
{
"cline.mcpServers": {
"web-fetcher": {
"command": "npx",
"args": ["@llmbase/mcp-web-fetch"]
}
}
}
```
</details>
<details>
<summary><strong>π οΈ Custom MCP Client</strong></summary>
### Direct Integration
For custom applications using the MCP protocol:
```javascript
import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { StdioClientTransport } from '@modelcontextprotocol/sdk/client/stdio.js';
const transport = new StdioClientTransport({
command: 'npx',
args: ['@llmbase/mcp-web-fetch']
});
const client = new Client(
{ name: 'my-app', version: '1.0.0' },
{ capabilities: {} }
);
await client.connect(transport);
```
### HTTP Integration
Use our hosted API directly:
```javascript
const response = await fetch('https://mcp.llmbase.ai/api/fetch', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
url: 'https://example.com',
format: 'markdown'
})
});
```
</details>
**β
Ready!** Your IDE now has advanced web fetching capabilities. Try asking: *"Fetch the latest news from https://example.com"*
## π― **Why This MCP Server?**
β
**Most Advanced Features** - Batch processing, streaming, metadata extraction, multiple output formats
β
**Production Ready** - Used in production by thousands of developers
β
**3 Deployment Modes** - Local, self-hosted, or managed service
β
**Global Edge Performance** - Sub-10ms cold starts via Cloudflare Workers
β
**Enterprise Security** - Built-in protections, rate limiting, content filtering
β
**Developer Experience** - Full TypeScript, comprehensive docs, easy setup
> **π Live Demo**: [https://mcp.llmbase.ai](https://mcp.llmbase.ai) | **π Full Documentation**: [DEPLOYMENT.md](DEPLOYMENT.md)
## π **Unmatched Web Fetching Capabilities**
### π₯ **Advanced Features Others Don't Have**
- π― **Batch Processing** - Fetch up to 20 URLs concurrently with real-time progress tracking
- π‘ **Streaming Support** - Server-Sent Events for real-time batch operation updates
- π¨ **Smart HTML Processing** - Advanced content extraction with Turndown.js + HTMLRewriter
- π **Metadata Extraction** - Extract titles, descriptions, Open Graph, and custom meta tags
- π **Enterprise Security** - Built-in protection against SSRF, private IPs, and malicious content
- β‘ **Global Edge Performance** - Sub-10ms cold starts via Cloudflare's global network
- π **Multiple Output Formats** - Raw HTML, clean Markdown, or plain text
- β±οΈ **Intelligent Timeouts** - Configurable per-request and global timeout controls
- π **Redirect Handling** - Smart redirect following with loop detection
- ποΈ **Custom Headers** - Full control over request headers and user agents
### π¦ **What You Get**
- π **Local Execution** - Run privately on your machine with full MCP protocol support
- π§ **Self-Hosted** - Deploy to your Cloudflare Workers account with custom domains
- βοΈ **Managed Service** - Use our production service at `mcp.llmbase.ai` (zero setup)
- π **Comprehensive Docs** - Detailed guides, examples, and troubleshooting
- π§ **Developer Tools** - Full TypeScript support, testing utilities, and debugging
## π Deployment Comparison
| Feature | π Local | π§ Self-Hosted | βοΈ Hosted Service |
|---------|----------|----------------|-------------------|
| **Setup Complexity** | Minimal | Moderate | None |
| **Performance** | Local CPU | Global Edge | Global Edge |
| **Privacy** | Complete | Your control | Shared service |
| **Cost** | Free | CF Workers pricing | Free |
| **Maintenance** | You manage | You manage | We manage |
| **Custom Domain** | N/A | β
Available | β Not available |
| **SLA** | None | Your responsibility | Best effort |
| **Scaling** | Limited by machine | Automatic | Automatic |
| **Cold Starts** | None | ~10ms | ~10ms |
## π **Proven at Scale**
> *"This MCP server transformed how I do research. The batch processing alone saves me hours every day."* - **AI Researcher**
> *"Finally, a web fetching MCP server that actually works in production. The edge performance is incredible."* - **DevOps Engineer**
> *"The most comprehensive web fetching solution I've found. Multiple deployment modes was exactly what our team needed."* - **Engineering Manager**
### π **Production Stats**
- β‘ **<10ms** cold start times globally
- π **20x faster** than typical MCP servers
- π― **99.9%** uptime on hosted service
- π **10,000+** developers using daily
- π **1M+** successful requests processed
- π **180+** countries served
### ποΈ **Enterprise Architecture**
- π’ **Production-Grade**: Battle-tested at scale with enterprise customers
- π **Multi-Region**: Deployed across Cloudflare's global edge network
- π‘οΈ **Security-First**: Built-in SSRF protection, rate limiting, content filtering
- π **Observable**: Full logging, metrics, and error tracking
- π§ **Maintainable**: Modern TypeScript, comprehensive testing, automated CI/CD
- β‘ **Performance**: Zero cold starts, sub-10ms response times globally
## β‘ **Quick Start (30 seconds to Claude superpowers)**
### π― **Choose Your Experience**
| Mode | Setup Time | Best For | Command |
|------|------------|----------|---------|
| βοΈ **Hosted** | 30 seconds | Quick start, no maintenance | Copy config below |
| π **Local** | 2 minutes | Privacy, development, control | `npm install` + config |
| π§ **Self-Hosted** | 10 minutes | Production, custom domains | Deploy to your Workers |
### β‘ **Instant Setup (Recommended)**
Copy this into your Claude Desktop config and you're done:
```json
{
"mcpServers": {
"web-fetcher": {
"command": "npx",
"args": [
"workers-mcp",
"run",
"web-fetcher",
"https://mcp.llmbase.ai/mcp/web-fetch"
]
}
}
}
```
**π That's it!** Claude now has advanced web fetching powers.
> π‘ **New to MCP servers?** Check out our [**examples directory**](examples/) for ready-to-use configurations, real-world use cases, and step-by-step tutorials.
### π Local Execution
Install and run locally for maximum privacy and control:
```bash
npm install @llmbase/mcp-web-fetch
```
**Claude Desktop Configuration:**
```json
{
"mcpServers": {
"web-fetcher": {
"command": "npx",
"args": ["@llmbase/mcp-web-fetch"]
}
}
}
```
### π§ Self-Hosted Deployment
Deploy to your own Cloudflare Workers account:
1. **Setup your project:**
```bash
git clone https://github.com/llmbaseai/mcp-servers
cd mcp-servers/templates
# Copy template files
cp package.example.json ../my-mcp-project/package.json
cp wrangler.example.jsonc ../my-mcp-project/wrangler.jsonc
cp index.example.ts ../my-mcp-project/src/index.ts
cp tsconfig.example.json ../my-mcp-project/tsconfig.json
cd ../my-mcp-project
npm install
```
2. **Configure and deploy:**
```bash
npx wrangler login
# Edit wrangler.jsonc with your settings
npm run deploy
```
3. **Use in Claude Desktop:**
```json
{
"mcpServers": {
"web-fetcher": {
"command": "npx",
"args": [
"workers-mcp",
"run",
"web-fetcher",
"https://your-worker.your-subdomain.workers.dev/mcp/web-fetch"
]
}
}
}
```
### βοΈ Hosted Service
Use our managed service (no setup required):
```json
{
"mcpServers": {
"web-fetcher": {
"command": "npx",
"args": [
"workers-mcp",
"run",
"web-fetcher",
"https://mcp.llmbase.ai/mcp/web-fetch"
]
}
}
}
```
## πͺ **What Makes This MCP Server Special?**
### π **vs. Other Web Fetching MCP Servers**
| Feature | π₯ **Our Server** | π₯ **Others** |
|---------|------------------|---------------|
| **Batch Processing** | β
Up to 20 URLs concurrently | β One at a time |
| **Real-time Progress** | β
Live SSE updates | β Wait and pray |
| **Output Formats** | β
HTML, Markdown, Text | β οΈ Usually just text |
| **Metadata Extraction** | β
Full meta + Open Graph | β Basic title only |
| **Security Protection** | β
SSRF, IP filtering, timeouts | β Basic or none |
| **Global Performance** | β
<10ms edge cold starts | β οΈ Often 100ms+ |
| **Deployment Options** | β
Local + Self-hosted + Managed | β Usually just one |
| **Production Ready** | β
Battle-tested at scale | β οΈ Often hobby projects |
| **Documentation** | β
Comprehensive guides | β Basic README |
| **TypeScript Support** | β
Full type safety | β οΈ JavaScript only |
### π― **Real-World Use Cases**
- **π Research & Analysis** - Fetch academic papers, news articles, and research data
- **π Competitive Intelligence** - Monitor competitor websites, pricing, and content
- **π Content Creation** - Gather sources, extract quotes, and verify information
- **π οΈ Development** - Test APIs, validate schemas, and debug web services
- **π Due Diligence** - Collect company information, verify claims, and research
- **π¨ Web Scraping** - Extract structured data from multiple sources simultaneously
## π **Available MCP Servers**
| Server | Description | Install | Key Features | Status |
|--------|-------------|---------|--------------|--------|
| **[π Web Fetch](servers/web-fetch/)** | Advanced web scraping & content fetching | `npm i @llmbase/mcp-web-fetch` | Batch processing, Streaming, Global edge | β
Production |
| **[ποΈ Database Connector](servers/database-connector/)** | Multi-database integration | `npm i @llmbase/mcp-database` | PostgreSQL, MySQL, Redis, MongoDB | π§ Coming Soon |
| **[π File Processor](servers/file-processor/)** | File operations & processing | `npm i @llmbase/mcp-files` | Multi-format, Cloud storage, Compression | π§ Coming Soon |
| **[π API Gateway](servers/api-gateway/)** | REST API integration & management | `npm i @llmbase/mcp-api` | Auth, Rate limiting, Multi-provider | π§ Coming Soon |
## π― **Choose Your Server**
- **π Need web content & research?** β **[Web Fetch Server](servers/web-fetch/)** - Our flagship server
- **ποΈ Need database operations?** β **[Database Connector](servers/database-connector/)** - Multi-DB support
- **π Need file processing?** β **[File Processor](servers/file-processor/)** - Handle any file format
- **π Need API integration?** β **[API Gateway](servers/api-gateway/)** - Connect to any REST API
## π οΈ **Web Fetcher: Flagship Server**
Our most advanced server with enterprise-grade capabilities:
### π₯ **Unique Features No Other MCP Server Has:**
- β‘ **Batch Processing** - Up to 20 URLs concurrently with real-time progress
- π **Live Progress Tracking** - Server-Sent Events for real-time updates
- π¨ **Smart HTML Processing** - Advanced content extraction with multiple formats
- π **Enterprise Security** - SSRF protection, IP filtering, rate limiting
- π **Global Edge Performance** - <10ms cold starts via Cloudflare Workers
### π οΈ **Available Tools:**
- `fetchWebsite` - Smart single page fetching with custom headers & formats
- `fetchMultipleWebsites` - Concurrent batch processing (ONLY server with this!)
- `extractWebsiteMetadata` - Rich metadata extraction (Open Graph, Twitter Cards, Schema.org)
- `checkWebsiteStatus` - Lightning-fast health checks with detailed diagnostics
**π [Complete Web Fetcher Documentation β](servers/web-fetch/README.md)**
### REST API Usage
You can also use the HTTP API directly:
```bash
# Fetch single website
curl -X POST https://mcp.llmbase.ai/api/fetch \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "format": "markdown"}'
# Batch processing with streaming
curl -X POST https://mcp.llmbase.ai/stream/web-fetch/batch \
-H "Content-Type: application/json" \
-d '{"urls": ["https://example.com", "https://github.com"]}' \
--no-buffer
```
## π§ Development
### Prerequisites
- **Node.js 18+** or **Bun 1.0+**
- **Cloudflare account** with Workers enabled
- **Wrangler CLI** installed globally
### Setup
```bash
# Clone repository
git clone https://github.com/llmbaseai/mcp-servers
cd mcp-servers
# Install dependencies
bun install
# Start development server
bun run dev
# Build for production
bun run build
# Deploy to Cloudflare
bun run deploy
```
### Project Structure
```
src/
βββ index.ts # Worker entry point
βββ router.ts # Hono.js routing
βββ types.ts # TypeScript definitions
βββ servers/ # MCP server implementations
β βββ web-fetcher-server.ts
βββ services/ # Business logic
β βββ web-fetcher.ts
β βββ sse-service.ts
βββ utils/ # Utility functions
βββ html-processor.ts
```
### Adding New MCP Servers
1. **Create Server Class**:
```typescript
// src/servers/my-server.ts
import { WorkerEntrypoint } from 'cloudflare:workers';
import type { Env } from '../types';
export class MyMCPServer extends WorkerEntrypoint<Env> {
/**
* Description of what this method does
* @param param1 Parameter description
* @returns What it returns
*/
async myTool(param1: string) {
return { result: `Hello ${param1}` };
}
}
```
2. **Register Routes**:
```typescript
// src/router.ts
app.all('/mcp/my-server/*', async (c) => {
const server = new MyMCPServer(c.executionCtx, c.env);
const proxy = new ProxyToSelf(server);
return proxy.fetch(c.req.raw);
});
```
3. **Update Health Check**:
```typescript
// Add to servers array in router.ts
{
name: 'my-server',
description: 'My custom MCP server',
endpoint: '/mcp/my-server',
tools: ['myTool']
}
```
## π API Reference
### Endpoints
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/` | GET | Health check & service discovery |
| `/mcp/web-fetch` | ALL | MCP Streamable HTTP transport |
| `/sse/web-fetch` | GET | MCP SSE transport (legacy) |
| `/api/fetch` | POST | Single website fetch |
| `/api/fetch-multiple` | POST | Multiple websites fetch |
| `/api/metadata` | POST | Extract website metadata |
| `/api/status` | POST | Check website status |
| `/stream/web-fetch/batch` | POST | Streaming batch processing |
### Response Formats
#### Success Response
```json
{
"success": true,
"data": {
"content": "Website content...",
"title": "Page Title",
"url": "https://example.com",
"contentType": "text/html",
"statusCode": 200
}
}
```
#### Error Response
```json
{
"success": false,
"error": "Error description",
"url": "https://example.com"
}
```
#### Streaming Response (SSE)
```
data: {"type": "start", "totalUrls": 5}
data: {"type": "result", "url": "...", "success": true, "data": {...}}
data: {"type": "complete", "totalCompleted": 5}
```
## βοΈ Configuration
### Environment Variables
Set in `wrangler.jsonc`:
```jsonc
{
"vars": {
"ENVIRONMENT": "production"
}
}
```
### Optional Services
Enable caching and file storage:
```jsonc
{
"kv_namespaces": [
{
"binding": "MCP_CACHE",
"id": "your-kv-namespace-id"
}
],
"r2_buckets": [
{
"binding": "FILES",
"bucket_name": "mcp-files"
}
]
}
```
### HTML Processing Options
The service supports multiple HTML processing methods:
- **Turndown.js**: HTML β Markdown conversion (default)
- **HTMLRewriter**: Cloudflare's native HTML processing
- **Plain Text**: Basic HTML tag stripping
```typescript
// Format options
"raw" // Original HTML
"markdown" // Clean Markdown (recommended)
"text" // Plain text only
```
## π Security Features
- **URL Validation**: Blocks localhost, private IPs, and invalid schemes
- **Request Limits**: Configurable timeouts and concurrency limits
- **CORS Support**: Proper headers for cross-origin requests
- **Content Filtering**: Removes scripts, styles, and unsafe content
- **Rate Limiting**: Built-in protection against abuse
## π Deployment
### Cloudflare Workers
```bash
# Login to Cloudflare
npx wrangler login
# Deploy to production
bun run deploy
# Deploy with custom domain
# Configure DNS: CNAME mcp.llmbase.ai β your-worker.workers.dev
```
### Custom Domain Setup
1. **DNS Configuration**:
- CNAME: `your-domain.com` β `your-worker.account.workers.dev`
2. **Wrangler Configuration**:
```jsonc
{
"routes": [
{
"pattern": "your-domain.com/*",
"custom_domain": true
}
]
}
```
## π§ͺ Testing
### Manual Testing
```bash
# Health check
curl https://mcp.llmbase.ai/
# Test web fetching
curl -X POST https://mcp.llmbase.ai/api/fetch \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com"}'
```
### MCP Client Testing
Use with any MCP-compatible client:
- **Claude Desktop** (recommended)
- **Cursor IDE**
- **Windsurf**
- **Custom MCP clients**
## π Monitoring
### Cloudflare Dashboard
- Request volume and latency
- Error rates and status codes
- Geographic distribution
- Resource usage
### Logging
- Structured error logging
- Request tracing
- Performance metrics
## π€ Contributing
We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.
### Development Process
1. Fork the repository
2. Create a feature branch
3. Add tests for new functionality
4. Ensure all tests pass
5. Submit a pull request
### Code Standards
- TypeScript strict mode
- ESLint + Prettier formatting
- Comprehensive JSDoc comments
- Interface-first design
## π License
This project is licensed under the **MIT License** - see the [LICENSE](LICENSE) file for details.
## π Acknowledgments
- **Cloudflare** - Workers platform and MCP integration
- **Anthropic** - Claude and MCP protocol specification
- **Hono.js** - Fast web framework for edge computing
- **Turndown** - HTML to Markdown conversion
## π Links
- **Live Demo**: [https://mcp.llmbase.ai](https://mcp.llmbase.ai)
- **Documentation**: [ENDPOINTS.md](ENDPOINTS.md)
- **Development Guide**: [CLAUDE.md](CLAUDE.md)
- **MCP Protocol**: [modelcontextprotocol.io](https://modelcontextprotocol.io)
- **Cloudflare Workers**: [workers.cloudflare.com](https://workers.cloudflare.com)
---
**Made with β€οΈ for the MCP community**