fetch_markdown
Convert web pages to clean Markdown with metadata and token counts for LLM processing. Use this tool to extract article content from URLs with a lightweight, browserless approach.
Instructions
Fetch a web page and convert it to clean, LLM-optimized Markdown. Returns the article content with metadata (title, author, date) and token count. Much faster and lighter than browser-based solutions.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | The URL of the web page to fetch and convert | |
| include_header | No | Include title/source/author header in output | |
| raw | No | Extract full page content instead of just the main article |
Implementation Reference
- src/index.ts:31-88 (handler)The async handler function that executes the fetch_markdown tool logic. It fetches the URL, converts HTML to Markdown using the readdown library, and returns the content with metadata (tokens, characters, title, author, date).
async ({ url, include_header, raw }) => { try { const response = await fetch(url, { headers: { "User-Agent": "Mozilla/5.0 (compatible; readdown/0.1; +https://github.com/zcag/readdown)", Accept: "text/html,application/xhtml+xml", }, signal: AbortSignal.timeout(15000), }); if (!response.ok) { return { content: [ { type: "text" as const, text: `Failed to fetch ${url}: HTTP ${response.status} ${response.statusText}`, }, ], isError: true, }; } const html = await response.text(); const result = readdown(html, { url, includeHeader: include_header, raw, }); const summary = [ `Tokens: ~${result.tokens}`, `Characters: ${result.chars}`, result.metadata.title ? `Title: ${result.metadata.title}` : null, result.metadata.author ? `Author: ${result.metadata.author}` : null, result.metadata.date ? `Date: ${result.metadata.date}` : null, ] .filter(Boolean) .join(" | "); return { content: [ { type: "text" as const, text: `[${summary}]\n\n${result.markdown}` }, ], }; } catch (err) { const message = err instanceof Error ? err.message : String(err); return { content: [ { type: "text" as const, text: `Error fetching ${url}: ${message}`, }, ], isError: true, }; } } - src/index.ts:18-30 (schema)Input schema definition using Zod. Defines three parameters: url (required string URL), include_header (optional boolean, default true), and raw (optional boolean, default false).
{ url: z.string().url().describe("The URL of the web page to fetch and convert"), include_header: z .boolean() .optional() .default(true) .describe("Include title/source/author header in output"), raw: z .boolean() .optional() .default(false) .describe("Extract full page content instead of just the main article"), }, - src/index.ts:13-89 (registration)Tool registration using server.tool(). Registers the 'fetch_markdown' tool with its name, description, schema, and handler function.
server.tool( "fetch_markdown", "Fetch a web page and convert it to clean, LLM-optimized Markdown. " + "Returns the article content with metadata (title, author, date) and token count. " + "Much faster and lighter than browser-based solutions.", { url: z.string().url().describe("The URL of the web page to fetch and convert"), include_header: z .boolean() .optional() .default(true) .describe("Include title/source/author header in output"), raw: z .boolean() .optional() .default(false) .describe("Extract full page content instead of just the main article"), }, async ({ url, include_header, raw }) => { try { const response = await fetch(url, { headers: { "User-Agent": "Mozilla/5.0 (compatible; readdown/0.1; +https://github.com/zcag/readdown)", Accept: "text/html,application/xhtml+xml", }, signal: AbortSignal.timeout(15000), }); if (!response.ok) { return { content: [ { type: "text" as const, text: `Failed to fetch ${url}: HTTP ${response.status} ${response.statusText}`, }, ], isError: true, }; } const html = await response.text(); const result = readdown(html, { url, includeHeader: include_header, raw, }); const summary = [ `Tokens: ~${result.tokens}`, `Characters: ${result.chars}`, result.metadata.title ? `Title: ${result.metadata.title}` : null, result.metadata.author ? `Author: ${result.metadata.author}` : null, result.metadata.date ? `Date: ${result.metadata.date}` : null, ] .filter(Boolean) .join(" | "); return { content: [ { type: "text" as const, text: `[${summary}]\n\n${result.markdown}` }, ], }; } catch (err) { const message = err instanceof Error ? err.message : String(err); return { content: [ { type: "text" as const, text: `Error fetching ${url}: ${message}`, }, ], isError: true, }; } } );