Skip to main content
Glama

extract

Extract structured data from any URL using a JSON schema. Define the data structure you need and get organized results from web content.

Instructions

Extract structured data from a URL using a JSON schema. Costs 5 credits.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesURL to extract data from
schemaNoJSON schema defining the structure to extract
promptNoNatural language instruction for extraction

Implementation Reference

  • The handler function for the 'extract' tool that processes the URL, schema, and prompt parameters, constructs the request body, and calls the API endpoint via apiPost('/extract', body)
    async ({ url, schema, prompt }) => {
      const body: Record<string, unknown> = { url };
      if (schema) body.schema = schema;
      if (prompt) body.prompt = prompt;
      return jsonResult(await apiPost("/extract", body));
    }
  • src/index.ts:109-123 (registration)
    Tool registration using server.tool() that defines the 'extract' tool name, description, input schema using Zod (url, optional schema, optional prompt), and the handler function
    server.tool(
      "extract",
      "Extract structured data from a URL using a JSON schema. Costs 5 credits.",
      {
        url: z.string().describe("URL to extract data from"),
        schema: z.record(z.unknown()).optional().describe("JSON schema defining the structure to extract"),
        prompt: z.string().optional().describe("Natural language instruction for extraction"),
      },
      async ({ url, schema, prompt }) => {
        const body: Record<string, unknown> = { url };
        if (schema) body.schema = schema;
        if (prompt) body.prompt = prompt;
        return jsonResult(await apiPost("/extract", body));
      }
    );
  • Input schema definition using Zod: url (required string), schema (optional record for JSON structure), and prompt (optional string for natural language instructions)
    {
      url: z.string().describe("URL to extract data from"),
      schema: z.record(z.unknown()).optional().describe("JSON schema defining the structure to extract"),
      prompt: z.string().optional().describe("Natural language instruction for extraction"),
    },
  • Helper function apiPost() that makes POST requests to the SearchClaw API with 30-second timeout, proper headers, and error handling
    async function apiPost(path: string, body: Record<string, unknown>) {
      const controller = new AbortController();
      const timeout = setTimeout(() => controller.abort(), 30000);
      try {
        const response = await fetch(`${API_BASE}${path}`, {
          method: "POST",
          headers: { ...headers, "Content-Type": "application/json" },
          body: JSON.stringify(body),
          signal: controller.signal,
        });
        if (!response.ok) {
          const text = await response.text();
          throw new Error(`SearchClaw API error ${response.status}: ${text}`);
        }
        return response.json();
      } finally {
        clearTimeout(timeout);
      }
    }
  • Helper function jsonResult() that formats API responses into MCP content format with JSON stringification
    function jsonResult(data: unknown) {
      return { content: [{ type: "text" as const, text: JSON.stringify(data, null, 2) }] };
    }
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It adds valuable context beyond the input schema by stating 'Costs 5 credits,' which informs about resource usage. However, it doesn't cover other behavioral aspects like rate limits, error handling, or output format, so it's not fully comprehensive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise and front-loaded: 'Extract structured data from a URL using a JSON schema. Costs 5 credits.' Every sentence earns its place by stating the core purpose and a key behavioral trait (cost), with zero waste or redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 3 parameters, no annotations, no output schema, and involves data extraction (a moderately complex operation), the description is somewhat incomplete. It covers the purpose and cost but lacks details on output, error cases, or integration with siblings. However, the high schema coverage helps offset some gaps, making it minimally adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 100%, meaning the input schema already documents all parameters (url, schema, prompt) well. The description doesn't add any additional meaning or details about the parameters beyond what's in the schema, so it meets the baseline score of 3 without compensating further.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Extract structured data from a URL using a JSON schema.' It specifies the verb ('extract'), resource ('structured data'), and mechanism ('from a URL using a JSON schema'). However, it doesn't explicitly distinguish this tool from sibling tools like 'browse', 'crawl', or 'search', which might also involve URL processing, so it doesn't reach the highest score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage by mentioning 'Costs 5 credits,' which suggests a cost consideration, but it doesn't provide explicit guidance on when to use this tool versus alternatives like 'browse' or 'crawl'. There's no mention of prerequisites, exclusions, or specific scenarios where this tool is preferred, leaving usage context somewhat vague.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/CSteenkamp/searchclaw-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server