extract-web-data
Extract structured data as JSON from any webpage by providing a URL and a natural language description of the data to retrieve. Ideal for transforming unstructured web content into actionable insights.
Instructions
Extracts structured data as JSON from a web page given a URL using a Natural Language description of the data.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| prompt | Yes | Natural Language description of the data to extract from the page | |
| url | Yes | The URL of the public webpage to extract data from |
Input Schema (JSON Schema)
{
"properties": {
"prompt": {
"description": "Natural Language description of the data to extract from the page",
"type": "string"
},
"url": {
"description": "The URL of the public webpage to extract data from",
"type": "string"
}
},
"required": [
"url",
"prompt"
],
"type": "object"
}
Implementation Reference
- src/index.ts:63-104 (handler)Handler for the 'extract-web-data' tool that validates inputs, calls the AgentQL API to extract structured data from the webpage, and returns the JSON result.case EXTRACT_TOOL_NAME: { const url = String(request.params.arguments?.url); const prompt = String(request.params.arguments?.prompt); if (!url || !prompt) { throw new Error("Both 'url' and 'prompt' are required"); } const endpoint = 'https://api.agentql.com/v1/query-data'; const response = await fetch(endpoint, { method: 'POST', headers: { 'X-API-Key': `${AGENTQL_API_KEY}`, 'X-TF-Request-Origin': 'mcp-server', 'Content-Type': 'application/json', }, body: JSON.stringify({ url: url, prompt: prompt, params: { wait_for: 0, is_scroll_to_bottom_enabled: false, mode: 'fast', is_screenshot_enabled: false, }, }), }); if (!response.ok) { throw new Error(`AgentQL API error: ${response.statusText}\n${await response.text()}`); } const json = (await response.json()) as AqlResponse; return { content: [ { type: 'text', text: JSON.stringify(json.data, null, 2), }, ], }; }
- src/index.ts:41-54 (schema)Input schema defining the required 'url' and 'prompt' parameters for the extract-web-data tool.inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'The URL of the public webpage to extract data from', }, prompt: { type: 'string', description: 'Natural Language description of the data to extract from the page', }, }, required: ['url', 'prompt'], },
- src/index.ts:34-58 (registration)Registration of the 'extract-web-data' tool via the ListToolsRequestHandler, providing name, description, and input schema.server.setRequestHandler(ListToolsRequestSchema, async () => { return { tools: [ { name: EXTRACT_TOOL_NAME, description: 'Extracts structured data as JSON from a web page given a URL using a Natural Language description of the data.', inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'The URL of the public webpage to extract data from', }, prompt: { type: 'string', description: 'Natural Language description of the data to extract from the page', }, }, required: ['url', 'prompt'], }, }, ], }; });