Skip to main content
Glama

fetch_txt

Extract plain text from a website by providing its URL. Removes HTML formatting and returns only the content, simplifying data processing and analysis.

Instructions

Fetch a website, return the content as plain text (no HTML)

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
headersNoOptional headers to include in the request
urlYesURL of the website to fetch

Implementation Reference

  • The core handler function for 'fetch_txt' tool. Fetches the URL, parses HTML with JSDOM, removes scripts and styles, extracts and normalizes plain text from body, returns structured content or error.
    static async txt(requestPayload: RequestPayload) { try { const response = await this._fetch(requestPayload); const html = await response.text(); const dom = new JSDOM(html); const document = dom.window.document; const scripts = document.getElementsByTagName("script"); const styles = document.getElementsByTagName("style"); Array.from(scripts).forEach((script) => script.remove()); Array.from(styles).forEach((style) => style.remove()); const text = document.body.textContent || ""; const normalizedText = text.replace(/\s+/g, " ").trim(); return { content: [{ type: "text", text: normalizedText }], isError: false, }; } catch (error) { return { content: [{ type: "text", text: (error as Error).message }], isError: true, }; } }
  • Zod schema for input validation (RequestPayloadSchema) used for all fetch tools including fetch_txt. Defines required 'url' and optional 'headers'.
    import { z } from "zod"; export const RequestPayloadSchema = z.object({ url: z.string().url(), headers: z.record(z.string()).optional(), }); export type RequestPayload = z.infer<typeof RequestPayloadSchema>;
  • src/index.ts:64-82 (registration)
    Tool registration in ListTools handler: defines name, description, and inputSchema for 'fetch_txt'.
    { name: "fetch_txt", description: "Fetch a website, return the content as plain text (no HTML)", inputSchema: { type: "object", properties: { url: { type: "string", description: "URL of the website to fetch", }, headers: { type: "object", description: "Optional headers to include in the request", }, }, required: ["url"], }, },
  • src/index.ts:118-121 (registration)
    Dispatch/registration in CallToolRequestSchema handler: routes 'fetch_txt' calls to Fetcher.txt after validation.
    if (request.params.name === "fetch_txt") { const fetchResult = await Fetcher.txt(validatedArgs); return fetchResult; }
  • Private helper method used by all fetch tools (including txt) to perform the HTTP fetch with custom User-Agent and error handling.
    private static async _fetch({ url, headers, }: RequestPayload): Promise<Response> { try { const response = await fetch(url, { headers: { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36", ...headers, }, }); if (!response.ok) { throw new Error(`HTTP error: ${response.status}`); } return response; } catch (e: unknown) { if (e instanceof Error) { throw new Error(`Failed to fetch ${url}: ${e.message}`); } else { throw new Error(`Failed to fetch ${url}: Unknown error`); } } }

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/tokenizin/mcp-npx-fetch'

If you have feedback or need assistance with the MCP directory API, please join our Discord server