extract-html-fragment
Extract specific HTML elements from webpages using CSS selectors to retrieve targeted content for data processing or analysis.
Instructions
Extract a specific HTML fragment from a webpage using CSS selectors
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | URL to fetch | |
| selector | Yes | CSS selector for the HTML fragment to extract | |
| anchorId | No | Optional anchor ID to locate a specific fragment | |
| method | No | HTTP method | GET |
| headers | No | HTTP headers | |
| body | No | Request body for POST requests | |
| timeout | No | Request timeout in milliseconds | |
| followRedirects | No | Whether to follow redirects | |
| fragmentSelector | No | CSS selector for the HTML fragment to extract (when responseType is html-fragment) |
Implementation Reference
- src/index.ts:288-378 (handler)The primary handler function for the 'extract-html-fragment' tool. Fetches the URL, parses HTML using JSDOM, locates elements via CSS selector (optionally using anchorId), extracts outerHTML of matching elements, and returns a structured JSON result including status, match count, and HTML content.// Handler for extract-html-fragment tool async function handleExtractHtmlFragment(args: ExtractHtmlFragmentArgs): Promise<z.infer<typeof CallToolResultSchema>> { try { const { url, selector, anchorId, method, headers, body, timeout, followRedirects } = args; console.error(`Extracting HTML fragment from ${url} using selector "${selector}"`); // Create request options const options: any = { method, headers: headers || {}, body: body || undefined, redirect: followRedirects ? "follow" : "manual", }; if (timeout) { // @ts-ignore - undici specific options options.bodyTimeout = timeout; // @ts-ignore - undici specific options options.headersTimeout = timeout; } // Perform the request const response = await fetch(url, options); if (!response.ok) { throw new Error(`Failed to fetch URL: HTTP ${response.status} ${response.statusText}`); } // Get the HTML content const html = await response.text(); // Parse the HTML using JSDOM const dom = new JSDOM(html); const document = dom.window.document; // Handle anchor if provided if (anchorId) { // Scroll to the anchor element const anchorElement = document.getElementById(anchorId); if (!anchorElement) { throw new Error(`Anchor element with ID "${anchorId}" not found`); } } // Find the element(s) matching the selector const elements = document.querySelectorAll(selector); if (elements.length === 0) { throw new Error(`No elements found matching selector "${selector}"`); } // Create a result object const result: Record<string, any> = { status: response.status, statusText: response.statusText, url: response.url, selector: selector, matchCount: elements.length, }; // Get the HTML content of the selected elements if (elements.length === 1) { // If only one element found, return its outer HTML result.html = elements[0].outerHTML; } else { // If multiple elements found, return an array of their outer HTML result.html = Array.from(elements).map(el => el.outerHTML); } return { content: [ { type: "text", text: JSON.stringify(result, null, 2) } ] }; } catch (error) { console.error(`Error extracting HTML fragment:`, error); return { isError: true, content: [ { type: "text", text: `Error extracting HTML fragment: ${(error as Error).message}` } ] }; } }
- src/index.ts:27-36 (schema)Zod schema for validating input arguments to the extract-html-fragment tool, used in the handler for parsing args.const ExtractHtmlFragmentArgsSchema = z.object({ url: z.string().url().describe("URL to fetch"), selector: z.string().describe("CSS selector for the HTML fragment to extract"), anchorId: z.string().optional().describe("Optional anchor ID to locate a specific fragment"), method: z.enum(["GET", "POST"]).default("GET").describe("HTTP method"), headers: z.record(z.string()).optional().describe("HTTP headers"), body: z.string().optional().describe("Request body for POST requests"), timeout: z.number().positive().optional().describe("Request timeout in milliseconds"), followRedirects: z.boolean().default(true).describe("Whether to follow redirects") });
- src/index.ts:50-83 (registration)Tool registration in the TOOLS array, including name, description, and JSON inputSchema returned by listTools.{ name: "extract-html-fragment", description: "Extract a specific HTML fragment from a webpage using CSS selectors", inputSchema: { type: "object", properties: { url: { type: "string", description: "URL to fetch" }, selector: { type: "string", description: "CSS selector for the HTML fragment to extract" }, anchorId: { type: "string", description: "Optional anchor ID to locate a specific fragment" }, method: { type: "string", enum: ["GET", "POST"], default: "GET", description: "HTTP method" }, headers: { type: "object", additionalProperties: { type: "string" }, description: "HTTP headers" }, body: { type: "string", description: "Request body for POST requests" }, timeout: { type: "number", description: "Request timeout in milliseconds" }, followRedirects: { type: "boolean", default: true, description: "Whether to follow redirects" }, fragmentSelector: { type: "string", description: "CSS selector for the HTML fragment to extract (when responseType is html-fragment)" } }, required: ["url", "selector"] }
- src/index.ts:161-162 (registration)Registration of the tool handler in the switch statement of the CallToolRequestSchema handler.case "extract-html-fragment": return handleExtractHtmlFragment(ExtractHtmlFragmentArgsSchema.parse(args));