Skip to main content
Glama
pavex

mcp-web-fetch

by pavex

mcp-web-fetch

Token-efficient web reading and HTTP requests for MCP agents.

An MCP server with two tools: fetch_text strips web pages down to clean readable text — dramatically reducing token usage when an agent needs to read a URL. http_request is a full HTTP client for REST API calls, form submissions, and anything requiring raw control.

Built for Claude, Cursor, and any MCP-compatible agent. No browser required. Pure Node.js, single bundled file.


Why fetch_text matters for agents

A typical web page weighs 300–800 KB of raw HTML — scripts, styles, nav bars, footers. Most of it is noise. An agent reading that page burns thousands of tokens on markup it cannot use.

fetch_text scrapes the page and returns only the readable content:

google.com raw HTML   →  ~480 000 chars
google.com fetch_text →       177 chars
manifesto page HTML   →  ~42 000 chars  
manifesto fetch_text  →    5 800 chars  (~7× smaller)

This is a simple HTML scraper — not a full browser renderer. It does not execute JavaScript, handle SPAs, or bypass bot protection. That is the tradeoff for zero dependencies and minimal overhead. For static pages, documentation, articles, and llms.txt files it works excellently.


Related MCP server: Fetch MCP Server

Tools

fetch_text — low-token web content

Fetches a URL and returns clean readable text. Skips all scripts, styles, navigation, and layout noise. Extracts <title> separately. Prefers <main> or <article> when available.

param

type

default

description

url

string

required

Any valid URL

max_chars

number

20000

Output character cap

timeout_ms

number

10000

Request timeout in ms

Response:

{
  "ok": true,
  "url": "https://example.com/article",
  "status": 200,
  "title": "Article title",
  "text": "Clean readable content without any HTML...",
  "char_count": 4821,
  "truncated": false,
  "elapsed_ms": 248
}

Examples:

# Read an article or documentation page
fetch_text("https://docs.example.com/guide")

# Read a manifesto or about page
fetch_text("https://unpredictablemachine.com/manifesto")

# Read llms.txt
fetch_text("https://example.com/llms.txt")

# Limit output for large pages
fetch_text("https://en.wikipedia.org/wiki/Node.js", max_chars=5000)

Limits:

  • Does not execute JavaScript — SPAs and dynamically rendered content may return empty or partial text

  • Does not handle bot protection or CAPTCHAs

  • Not a replacement for a headless browser


http_request — full HTTP client

Universal HTTP client with full control over method, headers, and body. Use for REST APIs, form posts, webhooks, localhost, and internal network addresses.

param

type

default

description

url

string

required

Any valid URL (https, http, localhost, internal IP)

method

string

GET

GET, POST, PUT, PATCH, DELETE, HEAD, OPTIONS

headers

object

{}

Custom request headers

body

string

Raw request body (XML, form-data, plain text)

body_json

object

Auto-serialized JSON + sets Content-Type: application/json

timeout_ms

number

10000

Request timeout in ms

max_bytes

number

500000

Response body size cap

body_json takes priority over body when both are provided.

Response:

{
  "ok": true,
  "url": "https://api.example.com/posts",
  "method": "POST",
  "status": 201,
  "status_text": "Created",
  "content_type": "application/json",
  "headers": { "content-type": "application/json" },
  "body": "{\"id\": 42}",
  "truncated": false,
  "elapsed_ms": 142
}

Examples:

# REST POST with JSON body
http_request("https://api.example.com/posts",
  method="POST",
  body_json={"title": "Hello", "published": true})

# PUT with Authorization header
http_request("https://api.example.com/users/1",
  method="PUT",
  headers={"Authorization": "Bearer TOKEN"},
  body_json={"name": "Pavel"})

# Raw XML payload
http_request("https://legacy.api/endpoint",
  method="POST",
  headers={"Content-Type": "application/xml"},
  body="<root><item>value</item></root>")

# DELETE
http_request("http://localhost:8080/api/posts/42", method="DELETE")

# Internal network
http_request("http://192.168.1.100:8080/api/status")

When to use which

situation

tool

Reading articles, docs, blog posts

fetch_text

Reading llms.txt or plain text files

fetch_text

REST API calls (POST / PUT / DELETE)

http_request

Raw response body or headers needed

http_request

Localhost or internal network

both work

JavaScript-rendered SPA

neither (use a browser)


Logging

All requests logged to .var/requests.log — one JSON line per request:

{"ts":"2026-06-10T08:20:00.000Z","tool":"fetch_text","method":"GET","url":"https://example.com","status":200,"ok":true,"elapsed_ms":248}

Rotates at ~1 MB → keeps one .1 backup. Configure or disable in src/Config.js:

LOG_FILE: '.var/requests.log',  // '' = disabled
LOG_MAX_BYTES: 1_000_000

Install & build

build.cmd

Installs dependencies, bundles to dist/mcp.js, runs tests. The dist/ folder is self-contained — no node_modules needed at runtime.

Claude Desktop config

{
  "mcpServers": {
    "mcp-web-fetch": {
      "command": "node",
      "args": ["D:/dev/ai/mcp-web-fetch/dist/mcp.js"]
    }
  }
}

Stack

  • Node.js 22+ (native fetch built-in, no extra HTTP dependency)

  • @modelcontextprotocol/sdk

  • node-html-parser — fast pure-JS HTML parser, no native bindings

  • zod + zod-to-json-schema

  • esbuild (build only)

Install Server
F
license - not found
A
quality
C
maintenance

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/pavex/mcp-web-fetch'

If you have feedback or need assistance with the MCP directory API, please join our Discord server