Skip to main content
Glama
j0hanz

superFetch MCP Server

Fetch URL MCP Server

npm version License: MIT Node.js TypeScript MCP SDK

Install in VS Code Install in VS Code Insiders Install in Cursor

Fetch public web pages and convert them into clean, AI-readable Markdown.

Overview

Fetch URL is a Model Context Protocol (MCP) server that fetches public web pages, extracts meaningful content using Mozilla's Readability algorithm, and converts the result into clean Markdown optimized for LLM context windows. It handles noise removal, caching, SSRF protection, async task execution, and supports both stdio and Streamable HTTP transports.

NOTE

Content extraction quality varies depending on the HTML structure and complexity of the source page. Fetch URL works best with standard article and documentation layouts. Pages relying on client-side JavaScript rendering may yield incomplete results.

Key Features

  • HTML to Markdown — Content extraction via Mozilla Readability + node-html-markdown

  • Noise removal — Strips navigation, ads, cookie banners, and other non-content elements

  • In-memory LRU cache — Faster repeat fetches with configurable TTL (24 h default)

  • Raw URL rewriting — Auto-converts GitHub, GitLab, Bitbucket, and Gist URLs to raw content endpoints

Tech Stack

Component

Technology

Runtime

Node.js >= 24

Language

TypeScript 5.9

MCP SDK

@modelcontextprotocol/sdk ^1.26.0

Content Extraction

@mozilla/readability ^0.6.0

DOM Parsing

linkedom ^0.18.12

Markdown Conversion

node-html-markdown ^2.0.0

Schema Validation

zod ^4.3.6

Package Manager

npm

Architecture

URL → Validate → DNS Preflight → HTTP Fetch → Decompress → Truncate HTML → Readability Extract → Noise Removal → Markdown Convert → Cleanup Pipeline → Cache → Response
  1. URL Validation — Normalize, block private hosts, transform raw-content URLs (GitHub, GitLab, Bitbucket)

  2. Fetch — HTTP request with redirect following, DNS preflight SSRF checks, and size limits (10 MB)

  3. Transform — Offloaded to worker threads: parse HTML with linkedom, extract with Readability, remove DOM noise, convert to Markdown

  4. Cleanup — Multi-pass Markdown normalization (heading promotion, spacing, skip-link removal)

  5. Cache + Respond — Store result in LRU cache, apply inline content limits, return structured content

Repository Structure

fetch-url-mcp/ ├── assets/ # Server icon (logo.svg) ├── examples/ # Client examples ├── scripts/ # Build & test orchestration ├── src/ │ ├── workers/ # Worker-thread child for HTML transforms │ ├── index.ts # CLI entrypoint, transport wiring, shutdown │ ├── server.ts # McpServer lifecycle and registration │ ├── tools.ts # fetch-url tool definition and pipeline │ ├── fetch.ts # URL normalization, SSRF, HTTP fetch │ ├── transform.ts # HTML-to-Markdown pipeline, worker pool │ ├── config.ts # Env-driven configuration │ ├── resources.ts # MCP resource/template registration │ ├── prompts.ts # MCP prompt registration (get-help) │ ├── mcp.ts # Task execution management │ ├── http-native.ts # Streamable HTTP server, auth, sessions │ └── instructions.md # Server instructions embedded at runtime ├── tests/ # Unit/integration tests (Node.js test runner) ├── package.json ├── tsconfig.json └── AGENTS.md

Requirements

  • Node.js >= 24

Quickstart

npx -y @j0hanz/fetch-url-mcp@latest --stdio

Add to your MCP client configuration:

{ "mcpServers": { "fetch-url-mcp": { "command": "npx", "args": ["-y", "@j0hanz/fetch-url-mcp@latest", "--stdio"] } } }

Client Example (CLI)

Build the server and examples, then run the client:

npm run build node dist/examples/mcp-fetch-url-client.js https://example.com

Optional flags:

  • --full reads the cached markdown resource to avoid inline truncation.

  • --task enables task-based execution with streamed status updates.

  • --task-ttl <ms> sets task TTL; --task-poll <ms> sets poll interval.

  • --http http://localhost:3000/mcp connects to the Streamable HTTP server.

  • Progress updates (when emitted) are printed to stderr.

Installation

No installation required — runs directly:

npx -y @j0hanz/fetch-url-mcp@latest --stdio

Global Install

npm install -g @j0hanz/fetch-url-mcp fetch-url-mcp --stdio

From Source

git clone https://github.com/j0hanz/fetch-url-mcp.git cd fetch-url-mcp npm install npm run build node dist/index.js --stdio

Docker

docker compose up --build

Configuration

Runtime Modes

Flag

Description

--stdio, -s

Run in stdio mode (for desktop MCP clients; default)

--http

Run in HTTP mode (Streamable HTTP on port 3000 by default)

--help, -h

Show usage help

--version, -v

Print server version

When no transport flag is passed, the server starts in stdio mode.

Environment Variables

Core Settings

Variable

Default

Description

HOST

127.0.0.1

HTTP server bind address

PORT

3000

HTTP server port (1024–65535)

LOG_LEVEL

info

Log level: debug, info, warn, error

FETCH_TIMEOUT_MS

15000

HTTP fetch timeout in ms (1000–60000)

CACHE_ENABLED

true

Enable/disable in-memory content cache

USER_AGENT

fetch-url-mcp/{version}

Custom User-Agent header

ALLOW_REMOTE

false

Allow remote connections in HTTP mode

ALLOWED_HOSTS

(empty)

Comma-separated host/origin allowlist for HTTP mode

MCP_STRICT_PROTOCOL_VERSION_HEADER

true

Require MCP-Protocol-Version on HTTP session initialize (false allows legacy headerless initialize)

Task Management

Variable

Default

Description

TASKS_MAX_TOTAL

5000

Maximum retained task records across all owners

TASKS_MAX_PER_OWNER

1000

Maximum retained task records per session/client

TASKS_STATUS_NOTIFICATIONS

false

Emit experimental notifications/tasks/status extension

Authentication (HTTP Mode)

Variable

Default

Description

ACCESS_TOKENS

(empty)

Comma-separated static bearer tokens

API_KEY

(empty)

Single API key (added to static tokens)

OAUTH_ISSUER_URL

(empty)

OAuth issuer URL (enables OAuth mode)

OAUTH_AUTHORIZATION_URL

(empty)

OAuth authorization endpoint

OAUTH_TOKEN_URL

(empty)

OAuth token endpoint

OAUTH_INTROSPECTION_URL

(empty)

OAuth token introspection endpoint

OAUTH_REVOCATION_URL

(empty)

OAuth token revocation endpoint

OAUTH_REGISTRATION_URL

(empty)

OAuth dynamic client registration

OAUTH_REQUIRED_SCOPES

(empty)

Required OAuth scopes

OAUTH_CLIENT_ID

(empty)

OAuth client ID

OAUTH_CLIENT_SECRET

(empty)

OAuth client secret

Transform & Workers

Variable

Default

Description

TRANSFORM_WORKER_MODE

threads

Worker mode: threads or process

TRANSFORM_WORKER_MAX_OLD_GENERATION_MB

(unset)

V8 old generation heap limit per worker

TRANSFORM_WORKER_MAX_YOUNG_GENERATION_MB

(unset)

V8 young generation heap limit per worker

TRANSFORM_WORKER_CODE_RANGE_MB

(unset)

V8 code range limit per worker

TRANSFORM_WORKER_STACK_MB

(unset)

Stack size limit per worker

Content Tuning

Variable

Default

Description

MAX_INLINE_CONTENT_CHARS

0

Global inline markdown limit (0 = unlimited)

FETCH_URL_MCP_EXTRA_NOISE_TOKENS

(empty)

Additional CSS class/id tokens for noise removal

FETCH_URL_MCP_EXTRA_NOISE_SELECTORS

(empty)

Additional CSS selectors for noise removal

MARKDOWN_HEADING_KEYWORDS

(built-in list)

Keywords triggering heading promotion

FETCH_URL_MCP_LOCALE

(system)

Locale for content processing

Server Tuning

Variable

Default

Description

SERVER_MAX_CONNECTIONS

0 (unlimited)

Maximum concurrent HTTP connections

SERVER_BLOCK_PRIVATE_CONNECTIONS

false

Block connections from private IP ranges

Hardcoded Defaults

Setting

Value

Max HTML size

10 MB

Max inline content chars

0 (unlimited, configurable)

Fetch timeout

15 s

Transform timeout

30 s

Tool timeout

Fetch + Transform + 5 s padding

Max redirects

5

Cache TTL

86400 s (24 h)

Cache max keys

100

Rate limit

100 requests / 60 s

Max sessions

200

Session TTL

30 min

Max URL length

2048 chars

Worker pool max scale

4

Usage

Stdio Mode

fetch-url-mcp --stdio

The server communicates via JSON-RPC over stdin/stdout. All MCP clients that support stdio transport can connect directly.

HTTP Mode

fetch-url-mcp # or PORT=8080 HOST=0.0.0.0 ALLOW_REMOTE=true fetch-url-mcp

The server starts a Streamable HTTP endpoint at /mcp. Authenticate with bearer tokens via the ACCESS_TOKENS or API_KEY environment variables.

For POST /mcp, clients should send:

  • Accept: application/json, text/event-stream

  • MCP-Protocol-Version: 2025-11-25 (or 2025-03-26 for legacy clients)

MCP Surface

Tools

fetch-url

Fetches a webpage and converts it to clean Markdown format optimized for LLM context.

Useful for:

  • Reading documentation, blog posts, or articles

  • Extracting main content while removing navigation and ads

  • Caching content to speed up repeated queries

Limitations:

  • Does not execute complex client-side JavaScript interactions

  • Inline output may be truncated when MAX_INLINE_CONTENT_CHARS is set

Parameters

Parameter

Type

Required

Default

Description

url

string (URL)

Yes

The URL of the webpage to fetch (http/https, max 2048 chars)

skipNoiseRemoval

boolean

No

false

Preserve navigation, footers, and other elements normally filtered

forceRefresh

boolean

No

false

Bypass cache and fetch fresh content

maxInlineChars

number

No

0

Per-call inline markdown limit (0 = unlimited; global cap still applies)

Returns
{ "url": "https://example.com", "inputUrl": "https://example.com", "resolvedUrl": "https://example.com", "finalUrl": "https://example.com", "title": "Example Domain", "metadata": { "title": "Example Domain", "description": "...", "author": "...", "image": "...", "favicon": "...", "publishedAt": "...", "modifiedAt": "..." }, "markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...", "fromCache": false, "fetchedAt": "2026-02-11T12:00:00.000Z", "contentSize": 1234, "truncated": false }

Field

Type

Description

url

string

The canonical URL (pre-raw-transform)

inputUrl

string?

The original URL provided by the caller

resolvedUrl

string?

The normalized/transformed URL that was fetched

finalUrl

string?

Final response URL after redirects

title

string?

Extracted page title

metadata

object?

Extracted metadata (title, description, author, image, favicon, publishedAt, modifiedAt)

markdown

string?

Extracted content in Markdown format

fromCache

boolean?

Whether the response was served from cache

fetchedAt

string?

ISO timestamp for fetch/cache retrieval

contentSize

number?

Full markdown size before inline truncation

truncated

boolean?

Whether inline markdown was truncated

error

string?

Error message if the request failed

statusCode

number?

HTTP status code for failed requests

details

object?

Additional error details

Annotations

Annotation

Value

readOnlyHint

true

destructiveHint

false

idempotentHint

true

openWorldHint

true

Async Task Execution

The fetch-url tool supports optional async task execution (execution.taskSupport: "optional"). Include a task field in the tool call to run the fetch in the background:

{ "method": "tools/call", "params": { "name": "fetch-url", "arguments": { "url": "https://example.com" }, "task": { "ttl": 30000 } } }

Then poll tasks/get until the task status is completed or failed, and retrieve the result via tasks/result.

Prompts

Name

Description

get-help

Returns server usage instructions

Resources

URI Pattern

MIME Type

Description

internal://instructions

text/markdown

Server instructions and usage guidance

internal://cache/{namespace}/{hash}

text/markdown

Cached markdown entries from prior fetch-url calls

Tasks

The server declares full MCP task support:

Endpoint

Description

tasks/list

List tasks (scoped to session/owner)

tasks/get

Get task status by ID

tasks/result

Retrieve completed task result

tasks/cancel

Cancel an in-flight task

HTTP Mode Endpoints

Method

Path

Auth

Description

GET

/health

No

Health check (minimal payload)

GET

/health?verbose=true

Yes*

Detailed diagnostics and runtime metrics

POST

/mcp

Yes

MCP JSON-RPC (Streamable HTTP)

GET

/mcp

Yes

SSE stream for server-initiated messages

DELETE

/mcp

Yes

Terminate MCP session

GET

/mcp/downloads/{namespace}/{hash}

Yes

Download cached content

* verbose=true can be read without auth only for local-only deployments (ALLOW_REMOTE=false).

Session Behavior

  • Sessions are created on the first POST /mcp request with an initialize message

  • Session ID is returned in the mcp-session-id response header

  • Sessions expire after 30 minutes of inactivity (max 200 concurrent)

Authentication

  • Static tokens: Set ACCESS_TOKENS or API_KEY environment variables; pass as Authorization: Bearer <token>

  • OAuth: Configure OAUTH_* environment variables to enable OAuth 2.0 token introspection

Client Configuration Examples

Add to your VS Code settings (.vscode/mcp.json or User Settings):

{ "servers": { "fetch-url-mcp": { "command": "npx", "args": ["-y", "@j0hanz/fetch-url-mcp@latest", "--stdio"] } } }

Add to claude_desktop_config.json:

{ "mcpServers": { "fetch-url-mcp": { "command": "npx", "args": ["-y", "@j0hanz/fetch-url-mcp@latest", "--stdio"] } } }

Install in Cursor

Or manually add to Cursor MCP settings:

{ "mcpServers": { "fetch-url-mcp": { "command": "npx", "args": ["-y", "@j0hanz/fetch-url-mcp@latest", "--stdio"] } } }

Add to your Windsurf MCP configuration:

{ "mcpServers": { "fetch-url-mcp": { "command": "npx", "args": ["-y", "@j0hanz/fetch-url-mcp@latest", "--stdio"] } } }

Use the published image from GitHub Container Registry:

{ "mcpServers": { "fetch-url-mcp": { "command": "docker", "args": [ "run", "-i", "--rm", "ghcr.io/j0hanz/fetch-url-mcp:latest", "--stdio" ] } } }

Or build and run locally:

docker build -t fetch-url-mcp . docker run -i --rm fetch-url-mcp --stdio

Security

SSRF Protection

Fetch URL blocks requests to private and internal network addresses:

  • Blocked hosts: localhost, 127.0.0.0/8, 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.0.0/16, 100.64.0.0/10

  • Blocked IPv6: ::1, fc00::/7, fe80::/10, IPv4-mapped private addresses

  • Cloud metadata: 169.254.169.254 (AWS), metadata.google.internal, metadata.azure.com, 100.100.100.200 (Azure IMDS)

DNS preflight checks run on every redirect hop to prevent DNS rebinding attacks.

Stdio Transport Safety

The server never writes non-protocol data to stdout. All logs and diagnostics go to stderr.

Rate Limiting

HTTP mode enforces a rate limit of 100 requests per 60-second window per client.

Content Safety

  • HTML downloads are capped at 10 MB

  • Worker threads run in isolation with configurable resource limits

  • Auth tokens are stored in-memory only and compared using timing-safe equality

Development Workflow

Install Dependencies

npm install

Scripts

Script

Command

Description

dev

npm run dev

TypeScript watch mode

dev:run

npm run dev:run

Run compiled output with watch + .env

build

npm run build

Clean, compile, copy assets, make executable

start

npm start

Run compiled server

test

npm test

Run test suite (Node.js native test runner)

test:coverage

npm run test:coverage

Run tests with coverage

lint

npm run lint

ESLint

lint:fix

npm run lint:fix

ESLint with auto-fix

format

npm run format

Prettier

type-check

npm run type-check

TypeScript type checking

inspector

npm run inspector

Build and launch MCP Inspector

Build and Release

npm run build # Clean → Compile → Copy Assets → chmod npm run prepublishOnly # Lint → Type-Check → Build npm publish # Publish to npm

CI/CD is handled via a GitHub Actions workflow (release.yml) that runs lint, type-check, test, build, and publishes to npm with version bumping.

Troubleshooting

MCP Inspector

Use the built-in inspector to test the server interactively:

npm run inspector

Common Issues

Issue

Solution

VALIDATION_ERROR on URL

URL is blocked (private IP/localhost) or malformed. Do not retry.

queue_full error

Worker pool busy. Wait briefly, then retry or use async task mode.

Garbled output

Binary content (images, PDFs) cannot be converted. Ensure the URL serves HTML.

No output in stdio mode

If you intended HTTP mode, pass --http. Stdio is the default transport.

Auth errors in HTTP mode

Set ACCESS_TOKENS or API_KEY env var and pass as Authorization: Bearer <token>.

Stdout / Stderr Guidance

In stdio mode, stdout is reserved exclusively for MCP JSON-RPC messages. Logs and diagnostics are written to stderr. Never pipe stdout to a log file when using stdio transport.

License

MIT

Install Server
A
security – no known vulnerabilities
F
license - not found
A
quality - confirmed to work

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/j0hanz/super-fetch-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server