spa_read
Extract content from JavaScript-heavy Single Page Applications by rendering pages with a headless browser and converting them to LLM-ready Markdown.
Instructions
Render a JavaScript SPA page and extract its content as LLM-ready Markdown. Uses a headless browser to execute JavaScript, then extracts the main article content.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | The URL of the SPA page to read | |
| waitForSelector | No | CSS selector to wait for before extraction | |
| waitTimeout | No | Navigation timeout in ms (default: 30000) | |
| includeMetadata | No | Include title/author/excerpt as YAML frontmatter (default: true) | |
| cookies | No | Cookies to inject before page load (e.g., session tokens) | |
| headers | No | Custom HTTP headers (e.g., Authorization) |
Implementation Reference
- src/tools/spa-read.ts:51-89 (handler)The main handler function that executes the spa_read tool logic. It takes URL and options, renders the page using a headless browser, extracts article content, and returns LLM-ready Markdown. Includes error handling and metadata formatting.
async ({ url, waitForSelector, waitTimeout, includeMetadata, cookies, headers }) => { try { const renderResult = await renderPage({ url, waitForSelector, waitTimeout, cookies, headers }); const extraction = extractArticle(renderResult.html, renderResult.url); if (extraction.markdown.length === 0) { return { content: [ { type: "text" as const, text: `Error: Page rendered but no content found at ${url}`, }, ], isError: true, }; } const markdown = formatAsLlmMarkdown( extraction, renderResult.url, includeMetadata ?? true, ); return { content: [{ type: "text" as const, text: markdown }], }; } catch (error) { const message = error instanceof Error ? error.message : String(error); return { content: [ { type: "text" as const, text: `Error reading ${url}: ${message}`, }, ], isError: true, }; } }, - src/tools/spa-read.ts:10-50 (schema)Input validation schemas using Zod: cookieSchema (lines 10-19) defines cookie structure, and the tool input schema (lines 26-50) validates URL, waitForSelector, waitTimeout, includeMetadata, cookies, and headers parameters.
const cookieSchema = z.object({ name: z.string().min(1).describe("Cookie name"), value: z.string().describe("Cookie value"), domain: z.string().optional().describe("Cookie domain (auto-inferred from URL if omitted)"), path: z.string().optional().describe("Cookie path (default: '/')"), secure: z.boolean().optional().describe("Secure flag"), httpOnly: z.boolean().optional().describe("HttpOnly flag"), expires: z.number().optional().describe("Expiration timestamp"), sameSite: z.enum(["Strict", "Lax", "None"]).optional().describe("SameSite attribute"), }); export function registerSpaReadTool(server: McpServer): void { server.tool( "spa_read", "Render a JavaScript SPA page and extract its content as LLM-ready Markdown. " + "Uses a headless browser to execute JavaScript, then extracts the main article content.", { url: z.string().url().describe("The URL of the SPA page to read"), waitForSelector: z .string() .optional() .describe("CSS selector to wait for before extraction"), waitTimeout: z .number() .min(1000) .max(120000) .optional() .describe("Navigation timeout in ms (default: 30000)"), includeMetadata: z .boolean() .optional() .describe("Include title/author/excerpt as YAML frontmatter (default: true)"), cookies: z .array(cookieSchema) .optional() .describe("Cookies to inject before page load (e.g., session tokens)"), headers: z .record(z.string(), z.string()) .optional() .describe("Custom HTTP headers (e.g., Authorization)"), }, - src/tools/spa-read.ts:21-91 (registration)Registration of the spa_read tool with the MCP server. Calls server.tool() with tool name 'spa_read', description, input schema, and handler function.
export function registerSpaReadTool(server: McpServer): void { server.tool( "spa_read", "Render a JavaScript SPA page and extract its content as LLM-ready Markdown. " + "Uses a headless browser to execute JavaScript, then extracts the main article content.", { url: z.string().url().describe("The URL of the SPA page to read"), waitForSelector: z .string() .optional() .describe("CSS selector to wait for before extraction"), waitTimeout: z .number() .min(1000) .max(120000) .optional() .describe("Navigation timeout in ms (default: 30000)"), includeMetadata: z .boolean() .optional() .describe("Include title/author/excerpt as YAML frontmatter (default: true)"), cookies: z .array(cookieSchema) .optional() .describe("Cookies to inject before page load (e.g., session tokens)"), headers: z .record(z.string(), z.string()) .optional() .describe("Custom HTTP headers (e.g., Authorization)"), }, async ({ url, waitForSelector, waitTimeout, includeMetadata, cookies, headers }) => { try { const renderResult = await renderPage({ url, waitForSelector, waitTimeout, cookies, headers }); const extraction = extractArticle(renderResult.html, renderResult.url); if (extraction.markdown.length === 0) { return { content: [ { type: "text" as const, text: `Error: Page rendered but no content found at ${url}`, }, ], isError: true, }; } const markdown = formatAsLlmMarkdown( extraction, renderResult.url, includeMetadata ?? true, ); return { content: [{ type: "text" as const, text: markdown }], }; } catch (error) { const message = error instanceof Error ? error.message : String(error); return { content: [ { type: "text" as const, text: `Error reading ${url}: ${message}`, }, ], isError: true, }; } }, ); } - src/index.ts:8-18 (registration)Import and registration of the spa_read tool in the main server initialization. Imports registerSpaReadTool and calls it with the server instance.
import { registerSpaReadTool } from "./tools/spa-read.js"; import { registerSpaScreenshotTool } from "./tools/spa-screenshot.js"; import { closeBrowser } from "./lib/renderer.js"; const server = new McpServer({ name: "spa-reader", version: "1.0.0", }); registerSpaReadTool(server); registerSpaScreenshotTool(server); - src/lib/renderer.ts:199-224 (helper)Core helper function renderPage() that launches a headless Chromium browser, navigates to the URL, waits for network idle and optional selector, and returns HTML content. Includes URL validation, cookie injection, and header customization.
export async function renderPage(options: RenderOptions): Promise<RenderResult> { const { parsedUrl, timeout, resolvedCookies, cleanedHeaders } = validateOptions(options); const browser = await getBrowser(); const context: BrowserContext = await browser.newContext({ userAgent: "spa-reader-mcp/1.0.0", extraHTTPHeaders: Object.keys(cleanedHeaders).length > 0 ? cleanedHeaders : undefined, }); try { if (resolvedCookies.length > 0) { await context.addCookies(resolvedCookies); } const page = await context.newPage(); await navigateAndWait(page, options, parsedUrl, timeout); const html = await page.content(); const url = page.url(); const title = await page.title(); return { html, url, title }; } finally { await context.close(); } }