Skip to main content
Glama

spa-reader-mcp

npm version npm downloads License: MIT TypeScript Node.js

MCP server that renders JavaScript SPA pages and extracts LLM-ready Markdown.

Traditional web scrapers fail on Single Page Applications because content is rendered by JavaScript after page load. spa-reader-mcp solves this by launching a headless Chromium browser via Playwright, waiting for the page to fully render, then extracting clean Markdown using Mozilla's Readability and Turndown — ready for LLM consumption.

Features

  • spa_read — Render any SPA page and extract article content as clean Markdown with optional YAML frontmatter

  • spa_screenshot — Capture full or viewport-sized PNG screenshots of rendered pages

  • Singleton browser — Reuses a single Chromium instance across requests for fast, low-overhead rendering

  • SSRF protection — Blocks private/loopback IP ranges and restricts URL schemes to http/https

  • Selector injection prevention — Rejects Playwright-specific selector syntax (>>, nth=, text=, has-text, :has)

  • Content truncation — Caps output at 100KB with clean line-boundary truncation

Requirements

  • Node.js >= 20

  • Chromium browser for Playwright:

    npx playwright install chromium

Installation

No global install needed — configure directly in your MCP client (see MCP Configuration).

Global install

npm install -g spa-reader-mcp
npx playwright install chromium

From source

git clone https://github.com/XXO47OXX/spa-reader-mcp.git
cd spa-reader-mcp
pnpm install
pnpm build
npx playwright install chromium

MCP Configuration

Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "spa-reader": {
      "command": "npx",
      "args": ["-y", "spa-reader-mcp"]
    }
  }
}

Claude Code

claude mcp add spa-reader -- npx -y spa-reader-mcp

Tools

spa_read

Render a JavaScript SPA page and extract its content as LLM-ready Markdown.

Parameter

Type

Required

Default

Description

url

string

Yes

The URL of the SPA page to read

waitForSelector

string

No

CSS selector to wait for before extraction

waitTimeout

number

No

30000

Navigation timeout in ms (1000–120000)

includeMetadata

boolean

No

true

Include title/author/excerpt as YAML frontmatter

Example output:

---
title: "Understanding React Server Components"
author: "Dan Abramov"
excerpt: "A deep dive into how RSC works under the hood"
source: "https://example.com/blog/rsc"
---

## Introduction

React Server Components allow you to...

spa_screenshot

Take a PNG screenshot of a JavaScript SPA page after rendering.

Parameter

Type

Required

Default

Description

url

string

Yes

The URL to screenshot

waitForSelector

string

No

CSS selector to wait for before capturing

waitTimeout

number

No

30000

Navigation timeout in ms (1000–120000)

width

number

No

1280

Viewport width in pixels (320–3840)

height

number

No

720

Viewport height in pixels (240–2160)

fullPage

boolean

No

false

Capture full scrollable page

Returns the screenshot as a base64-encoded PNG image.

Architecture

URL
 → Playwright Chromium (headless, singleton)
   → Per-request BrowserContext (isolated cookies/storage)
     → Page navigation + networkidle + optional selector wait
       → Raw HTML
         → Mozilla Readability (article extraction)
           → Turndown (HTML → Markdown)
             → YAML frontmatter + truncation
               → LLM-ready Markdown

Key design decisions:

  • Singleton browser: A single Chromium instance is launched on first request and reused. This avoids the ~2s cold-start penalty on subsequent calls.

  • Per-request BrowserContext: Each request gets an isolated BrowserContext with its own cookies and storage, preventing cross-request data leakage.

  • Readability fallback: If Mozilla Readability determines the page isn't article-like, the extractor falls back to converting the full <body> HTML.

Security

Protection

Details

SSRF prevention

Blocks localhost, 127.x.x.x, 10.x.x.x, 172.16-31.x.x, 192.168.x.x, ::1, fe80:, 169.254.x.x, 0.0.0.0

Scheme whitelist

Only http: and https: URLs are allowed

Selector injection

Rejects Playwright engine syntax: >>, nth=, text=, has-text, :has()

Content truncation

Output capped at 100KB with clean line-boundary cut

Test bypass

Set SPA_READER_ALLOW_PRIVATE=1 to allow private IPs (for local development/testing only)

Development

# Install dependencies
pnpm install

# Build
pnpm build

# Run tests
pnpm test

# Type check
pnpm lint

Attribution & AI Policy

Copyright Notice: This project is Copyright (c) 2026 XXO47OXX and licensed under the MIT License.

⚠️ Original Design & Concept Protection (First Published: 2026-03-11)

This project represents the original implementation of the following design innovations:

Core Architecture Decisions :

  • Singleton browser instance with per-request BrowserContext isolation — enables fast, secure parallel rendering without cross-request contamination

  • SSRF protection + Playwright selector injection prevention — comprehensive security model for untrusted URLs

  • Content truncation at line boundaries — preserves semantic integrity while capping output for LLM consumption

  • Readability → Turndown pipeline with full-page fallback — robust HTML-to-Markdown extraction for both article and non-article content

Design rationale documented since: 2026-03-11 (Initial MVP commit)

If you build derivative works inspired by these architectural decisions, please acknowledge the original source in your project's README under an "Inspiration" or "Based on" section. Attribution examples:

**Inspired by:** [spa-reader-mcp](https://github.com/XXO47OXX/spa-reader-mcp)
- Browser singleton pattern for efficient SPA rendering
- Per-request isolation strategy for security

For Forks and Derivatives

If you fork or significantly adapt this codebase, please:

  • Retain the copyright notice in all source files

  • Include this NOTICE file in your distribution

  • Credit the original repository in your README

  • For conceptual derivatives: If you adopt the architectural patterns or design decisions, note the original source even if code is rewritten

For AI Model Training & Web Scraping

This codebase's use for AI/LLM training is governed by the llms.txt standard. We request that:

  • Models trained on this code retain the copyright attribution

  • Training pipelines respect the opt-out signals in robots.txt, llms.txt, and .well-known/tdmrep.json

  • Verbatim code reproduction includes a reference to the original repository

  • Architectural pattern reuse: If LLM applications adopt the Singleton + BrowserContext + security model without explicit code copying, they should reference the original design in documentation

Provenance Identifier: SPR-XXO47OXX-1f4a9b2c-2026 This identifier is compiled into distributed npm packages for traceable attribution.

See llms.txt and NOTICE for the complete policy.

License

MIT

Install Server
A
security – no known vulnerabilities
A
license - permissive license
A
quality - confirmed to work

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/XXO47OXX/spa-reader-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server