Skip to main content
Glama

read-website-fast

by just-every
MIT License
439
1

@just-every/mcp-read-website-fast

Fast, token-efficient web content extraction for AI agents - converts websites to clean Markdown.

Overview

Existing MCP web crawlers are slow and consume large quantities of tokens. This pauses the development process and provides incomplete results as LLMs need to parse whole web pages.

This MCP package fetches web pages locally, strips noise, and converts content to clean Markdown while preserving links. Designed for Claude Code, IDEs and LLM pipelines with minimal token footprint. Crawl sites locally with minimal dependencies.

Features

  • Fast startup using official MCP SDK with lazy loading for optimal performance
  • Content extraction using Mozilla Readability (same as Firefox Reader View)
  • HTML to Markdown conversion with Turndown + GFM support
  • Smart caching with SHA-256 hashed URLs
  • Polite crawling with robots.txt support and rate limiting
  • Concurrent fetching with configurable depth crawling
  • Stream-first design for low memory usage
  • Link preservation for knowledge graphs
  • Optional chunking for downstream processing

Installation

Claude Code

claude mcp add read-website-fast -s user -- npx -y @just-every/mcp-read-website-fast

VS Code

code --add-mcp '{"name":"read-website-fast","command":"npx","args":["-y","@just-every/mcp-read-website-fast"]}'

Cursor

cursor://anysphere.cursor-deeplink/mcp/install?name=read-website-fast&config=eyJyZWFkLXdlYnNpdGUtZmFzdCI6eyJjb21tYW5kIjoibnB4IiwiYXJncyI6WyIteSIsIkBqdXN0LWV2ZXJ5L21jcC1yZWFkLXdlYnNpdGUtZmFzdCJdfX0=

JetBrains IDEs

Settings → Tools → AI Assistant → Model Context Protocol (MCP) → Add

Choose “As JSON” and paste:

{"command":"npx","args":["-y","@just-every/mcp-read-website-fast"]}

Or, in the chat window, type /add and fill in the same JSON—both paths land the server in a single step. 

Raw JSON (works in any MCP client)

{ "mcpServers": { "read-website-fast": { "command": "npx", "args": ["-y", "@just-every/mcp-read-website-fast"] } } }

Drop this into your client’s mcp.json (e.g. .vscode/mcp.json, ~/.cursor/mcp.json, or .mcp.json for Claude).

Features

  • Fast startup using official MCP SDK with lazy loading for optimal performance
  • Content extraction using Mozilla Readability (same as Firefox Reader View)
  • HTML to Markdown conversion with Turndown + GFM support
  • Smart caching with SHA-256 hashed URLs
  • Polite crawling with robots.txt support and rate limiting
  • Concurrent fetching with configurable depth crawling
  • Stream-first design for low memory usage
  • Link preservation for knowledge graphs
  • Optional chunking for downstream processing

Available Tools

  • read_website_fast - Fetches a webpage and converts it to clean markdown
    • Parameters:
      • url (required): The HTTP/HTTPS URL to fetch
      • depth (optional): Crawl depth (0 = single page)
      • respectRobots (optional): Whether to respect robots.txt

Available Resources

  • read-website-fast://status - Get cache statistics
  • read-website-fast://clear-cache - Clear the cache directory

Development Usage

Install

npm install npm run build

Single page fetch

npm run dev fetch https://example.com/article

Crawl with depth

npm run dev fetch https://example.com --depth 2 --concurrency 5

Output formats

# Markdown only (default) npm run dev fetch https://example.com # JSON output with metadata npm run dev fetch https://example.com --output json # Both URL and markdown npm run dev fetch https://example.com --output both

CLI Options

  • -d, --depth <number> - Crawl depth (0 = single page, default: 0)
  • -c, --concurrency <number> - Max concurrent requests (default: 3)
  • --no-robots - Ignore robots.txt
  • --all-origins - Allow cross-origin crawling
  • -u, --user-agent <string> - Custom user agent
  • --cache-dir <path> - Cache directory (default: .cache)
  • -t, --timeout <ms> - Request timeout in milliseconds (default: 30000)
  • -o, --output <format> - Output format: json, markdown, or both (default: markdown)

Clear cache

npm run dev clear-cache

Architecture

mcp/ ├── src/ │ ├── crawler/ # URL fetching, queue management, robots.txt │ ├── parser/ # DOM parsing, Readability, Turndown conversion │ ├── cache/ # Disk-based caching with SHA-256 keys │ ├── utils/ # Logger, chunker utilities │ └── index.ts # CLI entry point

Development

# Run in development mode npm run dev fetch https://example.com # Build for production npm run build # Run tests npm test # Type checking npm run typecheck # Linting npm run lint

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Submit a pull request

Troubleshooting

Cache Issues

npm run dev clear-cache

Timeout Errors

  • Increase timeout with -t flag
  • Check network connectivity
  • Verify URL is accessible

Content Not Extracted

  • Some sites block automated access
  • Try custom user agent with -u flag
  • Check if site requires JavaScript (not supported)

License

MIT

-
security - not tested
A
license - permissive license
-
quality - not tested

Fast, token-efficient web content extraction tool that converts websites to clean Markdown for AI agents, featuring smart caching, content extraction with Mozilla Readability, and polite crawling capabilities.

  1. Overview
    1. Features
      1. Installation
        1. Claude Code
        2. VS Code
        3. Cursor
        4. JetBrains IDEs
        5. Raw JSON (works in any MCP client)
      2. Features
        1. Available Tools
        2. Available Resources
      3. Development Usage
        1. Install
        2. Single page fetch
        3. Crawl with depth
        4. Output formats
        5. CLI Options
        6. Clear cache
      4. Architecture
        1. Development
          1. Contributing
            1. Troubleshooting
              1. Cache Issues
              2. Timeout Errors
              3. Content Not Extracted
            2. License

              Related MCP Servers

              • A
                security
                A
                license
                A
                quality
                Extracts and transforms webpage content into clean, LLM-optimized Markdown. Returns article title, main content, excerpt, byline and site name. Uses Mozilla's Readability algorithm to remove ads, navigation, footers and non-essential elements while preserving the core content structure.
                Last updated -
                1
                4
                11
                MIT License
              • A
                security
                A
                license
                A
                quality
                Enables web content scanning and analysis by fetching, analyzing, and extracting information from web pages using tools like page fetching, link extraction, site crawling, and more.
                Last updated -
                6
                7
                TypeScript
                MIT License
              • A
                security
                A
                license
                A
                quality
                Converts various file types and web content to Markdown format. It provides a set of tools to transform PDFs, images, audio files, web pages, and more into easily readable and shareable Markdown text.
                Last updated -
                10
                2
                1,611
                TypeScript
                MIT License
                • Apple
              • A
                security
                A
                license
                A
                quality
                This server converts webpages into clean, structured Markdown optimized for language model consumption, removing unnecessary content and supporting JavaScript rendering.
                Last updated -
                1
                5
                JavaScript
                MIT License
                • Apple

              View all related MCP servers

              MCP directory API

              We provide all the information about MCP servers via our MCP API.

              curl -X GET 'https://glama.ai/api/mcp/v1/servers/just-every/mcp-read-website-fast'

              If you have feedback or need assistance with the MCP directory API, please join our Discord server