Skip to main content
Glama

extract_links

Extract all links from a webpage with href values and anchor text, resolving relative URLs while excluding anchors and javascript links.

Instructions

Extract all links from a page with their href and anchor text. Resolves relative URLs. Skips anchors and javascript: links.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesThe URL to extract links from

Implementation Reference

  • The main extractLinks handler function that fetches a page, parses it with linkedom, finds all anchor tags with href attributes, resolves relative URLs, and returns an array of PageLink objects with href and text properties.
    export async function extractLinks(url: string): Promise<PageLink[]> {
      const html = await fetchPage(url);
      const { document } = parseHTML(html);
      const anchors = Array.from(document.querySelectorAll("a[href]"));
      const links: PageLink[] = [];
      const baseUrl = new URL(url);
      for (const a of anchors) {
        const href = a.getAttribute("href");
        if (!href || href.startsWith("#") || href.startsWith("javascript:"))
          continue;
        try {
          const resolved = new URL(href, baseUrl).href;
          links.push({ href: resolved, text: (a.textContent ?? "").trim() });
        } catch {
          // skip invalid URLs
        }
      }
      return links;
    }
  • src/index.ts:45-62 (registration)
    Tool registration for 'extract_links' using server.tool(), defining the URL input schema with zod validation and the async handler that formats the output links as markdown.
    server.tool(
      "extract_links",
      "Extract all links from a page with their href and anchor text. Resolves relative URLs. Skips anchors and javascript: links.",
      {
        url: z.string().url().describe("The URL to extract links from"),
      },
      async ({ url }) => {
        const links = await extractLinks(url);
        const text =
          links.length === 0
            ? "No links found."
            : links
                .map((l, i) => `${i + 1}. [${l.text || l.href}](${l.href})`)
                .join("\n");
    
        return { content: [{ type: "text", text }] };
      },
    );
  • Type definition for PageLink interface that defines the structure of extracted links with href (string) and text (string) properties.
    export interface PageLink {
      href: string;
      text: string;
    }
  • The fetchPage helper function used by extractLinks to fetch HTML content from URLs with appropriate User-Agent headers and error handling.
    async function fetchPage(url: string): Promise<string> {
      const response = await fetch(url, {
        headers: {
          "User-Agent": "Mozilla/5.0 (compatible; mcp-server-scraper/1.0)",
          Accept: "text/html,application/xhtml+xml",
        },
        redirect: "follow",
      });
      if (!response.ok) {
        throw new Error(`HTTP ${response.status}: ${response.statusText}`);
      }
      return response.text();
    }

Tool Definition Quality

Score is being calculated. Check back soon.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ofershap/mcp-server-scraper'

If you have feedback or need assistance with the MCP directory API, please join our Discord server