Skip to main content
Glama

extract-html-fragment

Extract specific HTML elements from webpages using CSS selectors to retrieve targeted content for data processing or analysis.

Instructions

Extract a specific HTML fragment from a webpage using CSS selectors

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesURL to fetch
selectorYesCSS selector for the HTML fragment to extract
anchorIdNoOptional anchor ID to locate a specific fragment
methodNoHTTP methodGET
headersNoHTTP headers
bodyNoRequest body for POST requests
timeoutNoRequest timeout in milliseconds
followRedirectsNoWhether to follow redirects
fragmentSelectorNoCSS selector for the HTML fragment to extract (when responseType is html-fragment)

Implementation Reference

  • The primary handler function for the 'extract-html-fragment' tool. Fetches the URL, parses HTML using JSDOM, locates elements via CSS selector (optionally using anchorId), extracts outerHTML of matching elements, and returns a structured JSON result including status, match count, and HTML content.
    // Handler for extract-html-fragment tool
    async function handleExtractHtmlFragment(args: ExtractHtmlFragmentArgs): Promise<z.infer<typeof CallToolResultSchema>> {
      try {
        const { url, selector, anchorId, method, headers, body, timeout, followRedirects } = args;
        
        console.error(`Extracting HTML fragment from ${url} using selector "${selector}"`); 
        
        // Create request options
        const options: any = {
          method,
          headers: headers || {},
          body: body || undefined,
          redirect: followRedirects ? "follow" : "manual",
        };
        
        if (timeout) {
          // @ts-ignore - undici specific options
          options.bodyTimeout = timeout;
          // @ts-ignore - undici specific options
          options.headersTimeout = timeout;
        }
        
        // Perform the request
        const response = await fetch(url, options);
        
        if (!response.ok) {
          throw new Error(`Failed to fetch URL: HTTP ${response.status} ${response.statusText}`);
        }
        
        // Get the HTML content
        const html = await response.text();
        
        // Parse the HTML using JSDOM
        const dom = new JSDOM(html);
        const document = dom.window.document;
        
        // Handle anchor if provided
        if (anchorId) {
          // Scroll to the anchor element
          const anchorElement = document.getElementById(anchorId);
          if (!anchorElement) {
            throw new Error(`Anchor element with ID "${anchorId}" not found`);
          }
        }
        
        // Find the element(s) matching the selector
        const elements = document.querySelectorAll(selector);
        
        if (elements.length === 0) {
          throw new Error(`No elements found matching selector "${selector}"`);
        }
        
        // Create a result object
        const result: Record<string, any> = {
          status: response.status,
          statusText: response.statusText,
          url: response.url,
          selector: selector,
          matchCount: elements.length,
        };
        
        // Get the HTML content of the selected elements
        if (elements.length === 1) {
          // If only one element found, return its outer HTML
          result.html = elements[0].outerHTML;
        } else {
          // If multiple elements found, return an array of their outer HTML
          result.html = Array.from(elements).map(el => el.outerHTML);
        }
        
        return {
          content: [
            {
              type: "text",
              text: JSON.stringify(result, null, 2)
            }
          ]
        };
      } catch (error) {
        console.error(`Error extracting HTML fragment:`, error);
        return {
          isError: true,
          content: [
            {
              type: "text",
              text: `Error extracting HTML fragment: ${(error as Error).message}`
            }
          ]
        };
      }
    }
  • Zod schema for validating input arguments to the extract-html-fragment tool, used in the handler for parsing args.
    const ExtractHtmlFragmentArgsSchema = z.object({
      url: z.string().url().describe("URL to fetch"),
      selector: z.string().describe("CSS selector for the HTML fragment to extract"),
      anchorId: z.string().optional().describe("Optional anchor ID to locate a specific fragment"),
      method: z.enum(["GET", "POST"]).default("GET").describe("HTTP method"),
      headers: z.record(z.string()).optional().describe("HTTP headers"),
      body: z.string().optional().describe("Request body for POST requests"),
      timeout: z.number().positive().optional().describe("Request timeout in milliseconds"),
      followRedirects: z.boolean().default(true).describe("Whether to follow redirects")
    });
  • src/index.ts:50-83 (registration)
    Tool registration in the TOOLS array, including name, description, and JSON inputSchema returned by listTools.
    {
      name: "extract-html-fragment",
      description: "Extract a specific HTML fragment from a webpage using CSS selectors",
      inputSchema: {
        type: "object",
        properties: {
          url: { type: "string", description: "URL to fetch" },
          selector: { type: "string", description: "CSS selector for the HTML fragment to extract" },
          anchorId: { type: "string", description: "Optional anchor ID to locate a specific fragment" },
          method: { 
            type: "string", 
            enum: ["GET", "POST"], 
            default: "GET", 
            description: "HTTP method" 
          },
          headers: { 
            type: "object", 
            additionalProperties: { type: "string" }, 
            description: "HTTP headers" 
          },
          body: { type: "string", description: "Request body for POST requests" },
          timeout: { type: "number", description: "Request timeout in milliseconds" },
          followRedirects: { 
            type: "boolean", 
            default: true, 
            description: "Whether to follow redirects" 
          },
          fragmentSelector: { 
            type: "string", 
            description: "CSS selector for the HTML fragment to extract (when responseType is html-fragment)" 
          }
        },
        required: ["url", "selector"]
      }
  • src/index.ts:161-162 (registration)
    Registration of the tool handler in the switch statement of the CallToolRequestSchema handler.
    case "extract-html-fragment":
      return handleExtractHtmlFragment(ExtractHtmlFragmentArgsSchema.parse(args));
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden for behavioral disclosure. While 'extract' implies a read operation, it doesn't address important behaviors like error handling, rate limits, authentication needs, or what happens when selectors don't match. The description mentions CSS selectors but doesn't explain the extraction mechanism or output format.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that gets straight to the point with zero wasted words. It's appropriately sized for the tool's complexity and front-loads the core functionality without unnecessary elaboration.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 9 parameters, no annotations, and no output schema, the description is inadequate. It doesn't explain what the tool returns, how errors are handled, or provide context about the extraction process. The agent would need to guess about the output format and error conditions when using this tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all 9 parameters thoroughly. The description adds minimal value beyond what's in the schema - it mentions CSS selectors (which appears twice in the schema) but doesn't provide additional context about parameter interactions or usage patterns.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose as extracting HTML fragments from webpages using CSS selectors, which is a specific verb+resource combination. However, it doesn't distinguish this tool from its sibling tools (check-status and fetch-url), which likely have different functions but could overlap in webpage interaction scenarios.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives like fetch-url or check-status. It doesn't mention prerequisites, limitations, or typical use cases, leaving the agent to infer usage context from the tool name and parameters alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mcollina/mcp-node-fetch'

If you have feedback or need assistance with the MCP directory API, please join our Discord server