How do I use spa-reader-mcp?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@spa-reader-mcp extract the article content from https://react.dev/blog" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

spa-reader-mcp

npm version npm downloads License: MIT TypeScript Node.js

MCP server that renders JavaScript SPA pages and extracts LLM-ready Markdown.

Traditional web scrapers fail on Single Page Applications because content is rendered by JavaScript after page load. spa-reader-mcp solves this by launching a headless Chromium browser via Playwright, waiting for the page to fully render, then extracting clean Markdown using Mozilla's Readability and Turndown — ready for LLM consumption.

Features

spa_read — Render any SPA page and extract article content as clean Markdown with optional YAML frontmatter
spa_screenshot — Capture full or viewport-sized PNG screenshots of rendered pages
Singleton browser — Reuses a single Chromium instance across requests for fast, low-overhead rendering
SSRF protection — Blocks private/loopback IP ranges and restricts URL schemes to http/https
Selector injection prevention — Rejects Playwright-specific selector syntax (>>, nth=, text=, has-text, :has)
Content truncation — Caps output at 100KB with clean line-boundary truncation

Requirements

Node.js >= 20
Chromium browser for Playwright:
```
npx playwright install chromium
```

Installation

npx (recommended, zero install)

No global install needed — configure directly in your MCP client (see MCP Configuration).

Global install

npm install -g spa-reader-mcp
npx playwright install chromium

From source

git clone https://github.com/XXO47OXX/spa-reader-mcp.git
cd spa-reader-mcp
pnpm install
pnpm build
npx playwright install chromium

MCP Configuration

Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "spa-reader": {
      "command": "npx",
      "args": ["-y", "spa-reader-mcp"]
    }
  }
}

Claude Code

claude mcp add spa-reader -- npx -y spa-reader-mcp

Tools

`spa_read`

Render a JavaScript SPA page and extract its content as LLM-ready Markdown.

Parameter	Type	Required	Default	Description
`url`	string	Yes	—	The URL of the SPA page to read
`waitForSelector`	string	No	—	CSS selector to wait for before extraction
`waitTimeout`	number	No	30000	Navigation timeout in ms (1000–120000)
`includeMetadata`	boolean	No	true	Include title/author/excerpt as YAML frontmatter

Example output:

---
title: "Understanding React Server Components"
author: "Dan Abramov"
excerpt: "A deep dive into how RSC works under the hood"
source: "https://example.com/blog/rsc"
---

## Introduction

React Server Components allow you to...

`spa_screenshot`

Take a PNG screenshot of a JavaScript SPA page after rendering.

Parameter	Type	Required	Default	Description
`url`	string	Yes	—	The URL to screenshot
`waitForSelector`	string	No	—	CSS selector to wait for before capturing
`waitTimeout`	number	No	30000	Navigation timeout in ms (1000–120000)
`width`	number	No	1280	Viewport width in pixels (320–3840)
`height`	number	No	720	Viewport height in pixels (240–2160)
`fullPage`	boolean	No	false	Capture full scrollable page

Returns the screenshot as a base64-encoded PNG image.

Architecture

URL
 → Playwright Chromium (headless, singleton)
   → Per-request BrowserContext (isolated cookies/storage)
     → Page navigation + networkidle + optional selector wait
       → Raw HTML
         → Mozilla Readability (article extraction)
           → Turndown (HTML → Markdown)
             → YAML frontmatter + truncation
               → LLM-ready Markdown

Key design decisions:

Singleton browser: A single Chromium instance is launched on first request and reused. This avoids the ~2s cold-start penalty on subsequent calls.
Per-request BrowserContext: Each request gets an isolated BrowserContext with its own cookies and storage, preventing cross-request data leakage.
Readability fallback: If Mozilla Readability determines the page isn't article-like, the extractor falls back to converting the full <body> HTML.

Security

Protection	Details
SSRF prevention	Blocks `localhost`, `127.x.x.x`, `10.x.x.x`, `172.16-31.x.x`, `192.168.x.x`, `::1`, `fe80:`, `169.254.x.x`, `0.0.0.0`
Scheme whitelist	Only `http:` and `https:` URLs are allowed
Selector injection	Rejects Playwright engine syntax: `>>`, `nth=`, `text=`, `has-text`, `:has()`
Content truncation	Output capped at 100KB with clean line-boundary cut
Test bypass	Set `SPA_READER_ALLOW_PRIVATE=1` to allow private IPs (for local development/testing only)

Development

# Install dependencies
pnpm install

# Build
pnpm build

# Run tests
pnpm test

# Type check
pnpm lint

Attribution & AI Policy

⚠️ Original Design & Concept Protection (First Published: 2026-03-11)

This project represents the original implementation of the following design innovations:

Core Architecture Decisions :

Singleton browser instance with per-request BrowserContext isolation — enables fast, secure parallel rendering without cross-request contamination
SSRF protection + Playwright selector injection prevention — comprehensive security model for untrusted URLs
Content truncation at line boundaries — preserves semantic integrity while capping output for LLM consumption
Readability → Turndown pipeline with full-page fallback — robust HTML-to-Markdown extraction for both article and non-article content

Design rationale documented since: 2026-03-11 (Initial MVP commit)

If you build derivative works inspired by these architectural decisions, please acknowledge the original source in your project's README under an "Inspiration" or "Based on" section. Attribution examples:

**Inspired by:** [spa-reader-mcp](https://github.com/XXO47OXX/spa-reader-mcp)
- Browser singleton pattern for efficient SPA rendering
- Per-request isolation strategy for security

For Forks and Derivatives

If you fork or significantly adapt this codebase, please:

Retain the copyright notice in all source files
Include this NOTICE file in your distribution
Credit the original repository in your README
For conceptual derivatives: If you adopt the architectural patterns or design decisions, note the original source even if code is rewritten

For AI Model Training & Web Scraping

This codebase's use for AI/LLM training is governed by the llms.txt standard. We request that:

Models trained on this code retain the copyright attribution
Training pipelines respect the opt-out signals in robots.txt, llms.txt, and .well-known/tdmrep.json
Verbatim code reproduction includes a reference to the original repository
Architectural pattern reuse: If LLM applications adopt the Singleton + BrowserContext + security model without explicit code copying, they should reference the original design in documentation

Provenance Identifier: SPR-XXO47OXX-1f4a9b2c-2026 This identifier is compiled into distributed npm packages for traceable attribution.

See llms.txt and NOTICE for the complete policy.

License

MIT

Install Server

A

security – no known vulnerabilities

A

license - permissive license

A

quality - confirmed to work

How are these scores calculated?

Resources

GitHub Repository

Need Help?

Report Issue

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

spa-reader-mcp

spa-reader-mcp

Features

Requirements

Installation

npx (recommended, zero install)

Global install

From source

MCP Configuration

Claude Desktop

Claude Code

Tools

`spa_read`

`spa_screenshot`

Architecture

Security

Development

Attribution & AI Policy

⚠️ Original Design & Concept Protection (First Published: 2026-03-11)

For Forks and Derivatives

For AI Model Training & Web Scraping

License

Resources

Looking for Admin?

Tools

Latest Blog Posts

MCP directory API