Skip to main content
Glama

extract-html-fragment

Extract specific HTML elements from webpages using CSS selectors to retrieve targeted content for data processing or analysis.

Instructions

Extract a specific HTML fragment from a webpage using CSS selectors

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesURL to fetch
selectorYesCSS selector for the HTML fragment to extract
anchorIdNoOptional anchor ID to locate a specific fragment
methodNoHTTP methodGET
headersNoHTTP headers
bodyNoRequest body for POST requests
timeoutNoRequest timeout in milliseconds
followRedirectsNoWhether to follow redirects
fragmentSelectorNoCSS selector for the HTML fragment to extract (when responseType is html-fragment)

Implementation Reference

  • The primary handler function for the 'extract-html-fragment' tool. Fetches the URL, parses HTML using JSDOM, locates elements via CSS selector (optionally using anchorId), extracts outerHTML of matching elements, and returns a structured JSON result including status, match count, and HTML content.
    // Handler for extract-html-fragment tool async function handleExtractHtmlFragment(args: ExtractHtmlFragmentArgs): Promise<z.infer<typeof CallToolResultSchema>> { try { const { url, selector, anchorId, method, headers, body, timeout, followRedirects } = args; console.error(`Extracting HTML fragment from ${url} using selector "${selector}"`); // Create request options const options: any = { method, headers: headers || {}, body: body || undefined, redirect: followRedirects ? "follow" : "manual", }; if (timeout) { // @ts-ignore - undici specific options options.bodyTimeout = timeout; // @ts-ignore - undici specific options options.headersTimeout = timeout; } // Perform the request const response = await fetch(url, options); if (!response.ok) { throw new Error(`Failed to fetch URL: HTTP ${response.status} ${response.statusText}`); } // Get the HTML content const html = await response.text(); // Parse the HTML using JSDOM const dom = new JSDOM(html); const document = dom.window.document; // Handle anchor if provided if (anchorId) { // Scroll to the anchor element const anchorElement = document.getElementById(anchorId); if (!anchorElement) { throw new Error(`Anchor element with ID "${anchorId}" not found`); } } // Find the element(s) matching the selector const elements = document.querySelectorAll(selector); if (elements.length === 0) { throw new Error(`No elements found matching selector "${selector}"`); } // Create a result object const result: Record<string, any> = { status: response.status, statusText: response.statusText, url: response.url, selector: selector, matchCount: elements.length, }; // Get the HTML content of the selected elements if (elements.length === 1) { // If only one element found, return its outer HTML result.html = elements[0].outerHTML; } else { // If multiple elements found, return an array of their outer HTML result.html = Array.from(elements).map(el => el.outerHTML); } return { content: [ { type: "text", text: JSON.stringify(result, null, 2) } ] }; } catch (error) { console.error(`Error extracting HTML fragment:`, error); return { isError: true, content: [ { type: "text", text: `Error extracting HTML fragment: ${(error as Error).message}` } ] }; } }
  • Zod schema for validating input arguments to the extract-html-fragment tool, used in the handler for parsing args.
    const ExtractHtmlFragmentArgsSchema = z.object({ url: z.string().url().describe("URL to fetch"), selector: z.string().describe("CSS selector for the HTML fragment to extract"), anchorId: z.string().optional().describe("Optional anchor ID to locate a specific fragment"), method: z.enum(["GET", "POST"]).default("GET").describe("HTTP method"), headers: z.record(z.string()).optional().describe("HTTP headers"), body: z.string().optional().describe("Request body for POST requests"), timeout: z.number().positive().optional().describe("Request timeout in milliseconds"), followRedirects: z.boolean().default(true).describe("Whether to follow redirects") });
  • src/index.ts:50-83 (registration)
    Tool registration in the TOOLS array, including name, description, and JSON inputSchema returned by listTools.
    { name: "extract-html-fragment", description: "Extract a specific HTML fragment from a webpage using CSS selectors", inputSchema: { type: "object", properties: { url: { type: "string", description: "URL to fetch" }, selector: { type: "string", description: "CSS selector for the HTML fragment to extract" }, anchorId: { type: "string", description: "Optional anchor ID to locate a specific fragment" }, method: { type: "string", enum: ["GET", "POST"], default: "GET", description: "HTTP method" }, headers: { type: "object", additionalProperties: { type: "string" }, description: "HTTP headers" }, body: { type: "string", description: "Request body for POST requests" }, timeout: { type: "number", description: "Request timeout in milliseconds" }, followRedirects: { type: "boolean", default: true, description: "Whether to follow redirects" }, fragmentSelector: { type: "string", description: "CSS selector for the HTML fragment to extract (when responseType is html-fragment)" } }, required: ["url", "selector"] }
  • src/index.ts:161-162 (registration)
    Registration of the tool handler in the switch statement of the CallToolRequestSchema handler.
    case "extract-html-fragment": return handleExtractHtmlFragment(ExtractHtmlFragmentArgsSchema.parse(args));

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mcollina/mcp-node-fetch'

If you have feedback or need assistance with the MCP directory API, please join our Discord server