Skip to main content
Glama

extract-web-data

Extract structured data as JSON from any webpage by providing a URL and a natural language description of the data to retrieve. Ideal for transforming unstructured web content into actionable insights.

Instructions

Extracts structured data as JSON from a web page given a URL using a Natural Language description of the data.

Input Schema

NameRequiredDescriptionDefault
promptYesNatural Language description of the data to extract from the page
urlYesThe URL of the public webpage to extract data from

Input Schema (JSON Schema)

{ "properties": { "prompt": { "description": "Natural Language description of the data to extract from the page", "type": "string" }, "url": { "description": "The URL of the public webpage to extract data from", "type": "string" } }, "required": [ "url", "prompt" ], "type": "object" }

Implementation Reference

  • Handler for the 'extract-web-data' tool that validates inputs, calls the AgentQL API to extract structured data from the webpage, and returns the JSON result.
    case EXTRACT_TOOL_NAME: { const url = String(request.params.arguments?.url); const prompt = String(request.params.arguments?.prompt); if (!url || !prompt) { throw new Error("Both 'url' and 'prompt' are required"); } const endpoint = 'https://api.agentql.com/v1/query-data'; const response = await fetch(endpoint, { method: 'POST', headers: { 'X-API-Key': `${AGENTQL_API_KEY}`, 'X-TF-Request-Origin': 'mcp-server', 'Content-Type': 'application/json', }, body: JSON.stringify({ url: url, prompt: prompt, params: { wait_for: 0, is_scroll_to_bottom_enabled: false, mode: 'fast', is_screenshot_enabled: false, }, }), }); if (!response.ok) { throw new Error(`AgentQL API error: ${response.statusText}\n${await response.text()}`); } const json = (await response.json()) as AqlResponse; return { content: [ { type: 'text', text: JSON.stringify(json.data, null, 2), }, ], }; }
  • Input schema defining the required 'url' and 'prompt' parameters for the extract-web-data tool.
    inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'The URL of the public webpage to extract data from', }, prompt: { type: 'string', description: 'Natural Language description of the data to extract from the page', }, }, required: ['url', 'prompt'], },
  • src/index.ts:34-58 (registration)
    Registration of the 'extract-web-data' tool via the ListToolsRequestHandler, providing name, description, and input schema.
    server.setRequestHandler(ListToolsRequestSchema, async () => { return { tools: [ { name: EXTRACT_TOOL_NAME, description: 'Extracts structured data as JSON from a web page given a URL using a Natural Language description of the data.', inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'The URL of the public webpage to extract data from', }, prompt: { type: 'string', description: 'Natural Language description of the data to extract from the page', }, }, required: ['url', 'prompt'], }, }, ], }; });

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/tinyfish-io/agentql-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server