Skip to main content
Glama

URL Metadata

fetch_url_metadata

Extract metadata from URLs including title, description, images, and publication details to understand webpage content without reading full text.

Instructions

Fetch a URL and extract its metadata: title, description, Open Graph tags (og:image, og:type), Twitter card tags, favicon, canonical URL, author, and publish date. Use to enrich links with context or understand what a page is about without reading the full content.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesThe URL to fetch metadata from
timeoutNoRequest timeout in milliseconds (default 8000)

Implementation Reference

  • The core logic for fetching a URL and extracting its metadata using cheerio.
    async function handler(input: Input) {
      const { url, timeout } = input;
    
      const controller = new AbortController();
      const timer = setTimeout(() => controller.abort(), timeout);
    
      let html: string;
      let finalUrl = url;
      let statusCode: number;
    
      try {
        const response = await fetch(url, {
          signal: controller.signal,
          headers: {
            "User-Agent":
              "Mozilla/5.0 (compatible; AgentToolbelt/1.0; +https://agent-toolbelt-production.up.railway.app)",
            Accept: "text/html,application/xhtml+xml,*/*",
          },
          redirect: "follow",
        });
    
        statusCode = response.status;
        finalUrl = response.url || url;
    
        if (!response.ok) {
          return {
            url,
            finalUrl,
            statusCode,
            error: `HTTP ${statusCode}`,
            metadata: null,
          };
        }
    
        const contentType = response.headers.get("content-type") || "";
        if (!contentType.includes("text/html")) {
          return {
            url,
            finalUrl,
            statusCode,
            contentType,
            error: "Not an HTML page",
            metadata: null,
          };
        }
    
        html = await response.text();
      } catch (err: any) {
        return {
          url,
          finalUrl: url,
          statusCode: null,
          error: err.name === "AbortError" ? "Request timed out" : err.message,
          metadata: null,
        };
      } finally {
        clearTimeout(timer);
      }
    
      const $ = cheerio.load(html);
    
      // Core metadata
      const title =
        $('meta[property="og:title"]').attr("content") ||
        $("title").text().trim() ||
        null;
    
      const description =
        $('meta[property="og:description"]').attr("content") ||
        $('meta[name="description"]').attr("content") ||
        null;
    
      // OG tags
      const og: Record<string, string> = {};
      $("meta[property^='og:']").each((_, el) => {
        const prop = $(el).attr("property")?.replace("og:", "");
        const content = $(el).attr("content");
        if (prop && content) og[prop] = content;
      });
    
      // Twitter card tags
      const twitter: Record<string, string> = {};
      $("meta[name^='twitter:']").each((_, el) => {
        const name = $(el).attr("name")?.replace("twitter:", "");
        const content = $(el).attr("content");
        if (name && content) twitter[name] = content;
      });
    
      // Favicon
      const faviconHref =
        $('link[rel="icon"]').attr("href") ||
        $('link[rel="shortcut icon"]').attr("href") ||
        $('link[rel="apple-touch-icon"]').attr("href") ||
        "/favicon.ico";
      const favicon = absoluteUrl(finalUrl, faviconHref);
    
      // Canonical URL
      const canonical = absoluteUrl(finalUrl, $('link[rel="canonical"]').attr("href"));
    
      // Theme color
      const themeColor = $('meta[name="theme-color"]').attr("content") || null;
    
      // Author
      const author =
        $('meta[name="author"]').attr("content") ||
        $('meta[property="article:author"]').attr("content") ||
        null;
    
      // Published / modified dates
      const publishedTime =
        $('meta[property="article:published_time"]').attr("content") ||
        $('meta[name="date"]').attr("content") ||
        null;
      const modifiedTime =
        $('meta[property="article:modified_time"]').attr("content") || null;
    
      return {
        url,
        finalUrl,
        statusCode,
        metadata: {
          title,
          description,
          favicon,
          canonical,
          author,
          themeColor,
          publishedTime,
          modifiedTime,
          og,
          twitter,
        },
      };
    }
  • Input validation schema for the URL metadata tool using zod.
    const inputSchema = z.object({
      url: z.string().url().describe("The URL to fetch metadata from"),
      timeout: z
        .number()
        .int()
        .min(1000)
        .max(15000)
        .default(8000)
        .describe("Request timeout in milliseconds (1000-15000, default 8000)"),
    });
  • Definition and registration of the url-metadata tool.
    const urlMetadataTool: ToolDefinition<Input> = {
      name: "url-metadata",
      description:
        "Fetch a URL and extract metadata: title, description, Open Graph tags, Twitter card tags, favicon, canonical URL, author, and publish dates. Ideal for agents that need to enrich links with context.",
      version: "1.0.0",
      inputSchema,
      handler,
      metadata: {
        tags: ["url", "metadata", "og-tags", "scraping", "enrichment"],
        pricing: "$0.001 per call",
        exampleInput: {
          url: "https://github.com/anthropics/anthropic-sdk-python",
          timeout: 8000,
        },
      },
    };
    
    registerTool(urlMetadataTool);
  • Helper function to resolve absolute URLs.
    function absoluteUrl(base: string, href: string | undefined): string | null {
      if (!href) return null;
      try {
        return new URL(href, base).href;
      } catch {
        return null;
      }
    }
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. It comprehensively lists extracted fields but omits error handling (timeout behavior, invalid URLs, redirects), side effects, or network requirements. Covers happy-path behavior adequately but lacks failure mode disclosure.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with zero waste. First sentence front-loads the action and specific extraction targets. Second sentence provides usage intent. No redundant phrases or repetition of structured data.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a 2-parameter tool without output schema, the description compensates effectively by enumerating all returned metadata fields (title, OG tags, etc.). Would benefit from mentioning error handling or edge cases (e.g., non-HTML content), but adequately complete for tool selection.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% description coverage ('The URL to fetch metadata from', 'Request timeout in milliseconds'). The description lists extracted fields which contextualizes the URL parameter's purpose, but adds no syntax, format constraints, or examples beyond the schema. Baseline 3 appropriate given schema completeness.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses specific verbs ('Fetch', 'extract') and enumerates exact metadata fields (title, description, Open Graph tags, Twitter cards, favicon, etc.). It clearly distinguishes from siblings like 'audit_dependencies' or 'stock_thesis' by focusing on web content metadata extraction.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides clear usage context ('enrich links with context', 'understand what a page is about without reading the full content'), effectively indicating when to use it. Lacks explicit 'when not to use' or named alternatives, though siblings are sufficiently distinct that direct comparison is unnecessary.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/marras0914/agent-toolbelt'

If you have feedback or need assistance with the MCP directory API, please join our Discord server