Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@spa-reader-mcpextract the article content from https://react.dev/blog"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
spa-reader-mcp
MCP server that renders JavaScript SPA pages and extracts LLM-ready Markdown.
Traditional web scrapers fail on Single Page Applications because content is rendered by JavaScript after page load. spa-reader-mcp solves this by launching a headless Chromium browser via Playwright, waiting for the page to fully render, then extracting clean Markdown using Mozilla's Readability and Turndown — ready for LLM consumption.
Features
spa_read— Render any SPA page and extract article content as clean Markdown with optional YAML frontmatterspa_screenshot— Capture full or viewport-sized PNG screenshots of rendered pagesSingleton browser — Reuses a single Chromium instance across requests for fast, low-overhead rendering
SSRF protection — Blocks private/loopback IP ranges and restricts URL schemes to
http/httpsSelector injection prevention — Rejects Playwright-specific selector syntax (
>>,nth=,text=,has-text,:has)Content truncation — Caps output at 100KB with clean line-boundary truncation
Requirements
Node.js >= 20
Chromium browser for Playwright:
npx playwright install chromium
Installation
npx (recommended, zero install)
No global install needed — configure directly in your MCP client (see MCP Configuration).
Global install
npm install -g spa-reader-mcp
npx playwright install chromiumFrom source
git clone https://github.com/XXO47OXX/spa-reader-mcp.git
cd spa-reader-mcp
pnpm install
pnpm build
npx playwright install chromiumMCP Configuration
Claude Desktop
Add to your claude_desktop_config.json:
{
"mcpServers": {
"spa-reader": {
"command": "npx",
"args": ["-y", "spa-reader-mcp"]
}
}
}Claude Code
claude mcp add spa-reader -- npx -y spa-reader-mcpTools
spa_read
Render a JavaScript SPA page and extract its content as LLM-ready Markdown.
Parameter | Type | Required | Default | Description |
| string | Yes | — | The URL of the SPA page to read |
| string | No | — | CSS selector to wait for before extraction |
| number | No | 30000 | Navigation timeout in ms (1000–120000) |
| boolean | No | true | Include title/author/excerpt as YAML frontmatter |
Example output:
---
title: "Understanding React Server Components"
author: "Dan Abramov"
excerpt: "A deep dive into how RSC works under the hood"
source: "https://example.com/blog/rsc"
---
## Introduction
React Server Components allow you to...spa_screenshot
Take a PNG screenshot of a JavaScript SPA page after rendering.
Parameter | Type | Required | Default | Description |
| string | Yes | — | The URL to screenshot |
| string | No | — | CSS selector to wait for before capturing |
| number | No | 30000 | Navigation timeout in ms (1000–120000) |
| number | No | 1280 | Viewport width in pixels (320–3840) |
| number | No | 720 | Viewport height in pixels (240–2160) |
| boolean | No | false | Capture full scrollable page |
Returns the screenshot as a base64-encoded PNG image.
Architecture
URL
→ Playwright Chromium (headless, singleton)
→ Per-request BrowserContext (isolated cookies/storage)
→ Page navigation + networkidle + optional selector wait
→ Raw HTML
→ Mozilla Readability (article extraction)
→ Turndown (HTML → Markdown)
→ YAML frontmatter + truncation
→ LLM-ready MarkdownKey design decisions:
Singleton browser: A single Chromium instance is launched on first request and reused. This avoids the ~2s cold-start penalty on subsequent calls.
Per-request BrowserContext: Each request gets an isolated BrowserContext with its own cookies and storage, preventing cross-request data leakage.
Readability fallback: If Mozilla Readability determines the page isn't article-like, the extractor falls back to converting the full
<body>HTML.
Security
Protection | Details |
SSRF prevention | Blocks |
Scheme whitelist | Only |
Selector injection | Rejects Playwright engine syntax: |
Content truncation | Output capped at 100KB with clean line-boundary cut |
Test bypass | Set |
Development
# Install dependencies
pnpm install
# Build
pnpm build
# Run tests
pnpm test
# Type check
pnpm lintAttribution & AI Policy
Copyright Notice: This project is Copyright (c) 2026 XXO47OXX and licensed under the MIT License.
⚠️ Original Design & Concept Protection (First Published: 2026-03-11)
This project represents the original implementation of the following design innovations:
Core Architecture Decisions :
Singleton browser instance with per-request BrowserContext isolation — enables fast, secure parallel rendering without cross-request contamination
SSRF protection + Playwright selector injection prevention — comprehensive security model for untrusted URLs
Content truncation at line boundaries — preserves semantic integrity while capping output for LLM consumption
Readability → Turndown pipeline with full-page fallback — robust HTML-to-Markdown extraction for both article and non-article content
Design rationale documented since: 2026-03-11 (Initial MVP commit)
If you build derivative works inspired by these architectural decisions, please acknowledge the original source in your project's README under an "Inspiration" or "Based on" section. Attribution examples:
**Inspired by:** [spa-reader-mcp](https://github.com/XXO47OXX/spa-reader-mcp)
- Browser singleton pattern for efficient SPA rendering
- Per-request isolation strategy for securityFor Forks and Derivatives
If you fork or significantly adapt this codebase, please:
Retain the copyright notice in all source files
Include this NOTICE file in your distribution
Credit the original repository in your README
For conceptual derivatives: If you adopt the architectural patterns or design decisions, note the original source even if code is rewritten
For AI Model Training & Web Scraping
This codebase's use for AI/LLM training is governed by the llms.txt standard. We request that:
Models trained on this code retain the copyright attribution
Training pipelines respect the opt-out signals in
robots.txt,llms.txt, and.well-known/tdmrep.jsonVerbatim code reproduction includes a reference to the original repository
Architectural pattern reuse: If LLM applications adopt the Singleton + BrowserContext + security model without explicit code copying, they should reference the original design in documentation
Provenance Identifier: SPR-XXO47OXX-1f4a9b2c-2026
This identifier is compiled into distributed npm packages for traceable attribution.
See llms.txt and NOTICE for the complete policy.
License
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.