Skip to main content
Glama

extract_data

Extract structured JSON data from web pages using AI, guided by custom instructions and JSON templates. Integrates with ReviewWebsite API for precise data retrieval and formatting.

Instructions

Extract structured data (JSON) from a web page URL using AI via ReviewWeb.site API.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
api_keyNoYour ReviewWebsite API key
debugNoEnable debug mode for detailed logging
delayAfterLoadNoOptional delay after page load in milliseconds
instructionsYesInstructions for the AI on what data to extract
jsonTemplateYesJSON template for structuring the extracted data
modelNoAI model to use for extraction
recursiveNoIf true, recursively scrape all internal URLs and extract data from each
systemPromptNoOptional system prompt to guide the AI
urlYesThe URL to extract data from

Implementation Reference

  • Registration of the MCP tool 'extract_data' with description, input schema, and handler function.
    'extract_data', `Extract structured data (JSON) from a web page URL using AI via ReviewWeb.site API.`, ExtractDataToolArgs.shape, handleExtractData, );
  • The MCP handler function that processes tool calls to 'extract_data', delegates to controller, and formats MCP response.
    async function handleExtractData(args: ExtractDataToolArgsType) { const methodLogger = Logger.forContext( 'tools/reviewwebsite.tool.ts', 'handleExtractData', ); methodLogger.debug(`Extracting data from URL with options:`, { ...args, api_key: args.api_key ? '[REDACTED]' : undefined, }); try { const result = await reviewWebsiteController.extractData( args.url, { instructions: args.instructions, systemPrompt: args.systemPrompt, jsonTemplate: args.jsonTemplate, model: args.model, delayAfterLoad: args.delayAfterLoad, recursive: args.recursive, debug: args.debug, }, { api_key: args.api_key, }, ); return { content: [ { type: 'text' as const, text: result.content, }, ], }; } catch (error) { methodLogger.error(`Error extracting data from URL`, error); return formatErrorForMcpTool(error); } }
  • Zod schema defining the input parameters for the 'extract_data' tool.
    export const ExtractDataToolArgs = z.object({ url: z.string().describe('The URL to extract data from'), instructions: z .string() .describe('Instructions for the AI on what data to extract'), jsonTemplate: z .string() .describe('JSON template for structuring the extracted data'), systemPrompt: z .string() .optional() .describe('Optional system prompt to guide the AI'), model: z.string().optional().describe('AI model to use for extraction'), delayAfterLoad: z .number() .optional() .describe('Optional delay after page load in milliseconds'), recursive: z .boolean() .optional() .describe( 'If true, recursively scrape all internal URLs and extract data from each', ), debug: z .boolean() .optional() .describe('Enable debug mode for detailed logging'), api_key: z.string().optional().describe('Your ReviewWebsite API key'), });
  • Service-level implementation that performs the actual HTTP POST request to the ReviewWebsite API's /extract endpoint.
    async function extractData( url: string, options: ExtractDataOptions, apiKey?: string, ): Promise<any> { const methodLogger = Logger.forContext( 'services/vendor.reviewwebsite.service.ts', 'extractData', ); try { methodLogger.debug('Extracting data from URL', { url, options }); const response = await axios.post( `${API_BASE}/extract`, { url, options, }, { headers: getHeaders(apiKey), }, ); methodLogger.debug('Successfully extracted data from URL'); return response.data; } catch (error) { return handleApiError(error, 'extractData'); } }
  • Controller function that orchestrates the service call, handles API key resolution, error handling, and response formatting.
    async function extractData( url: string, extractOptions: ExtractDataOptions, options: ReviewWebsiteOptions = {}, ): Promise<ControllerResponse> { const methodLogger = Logger.forContext( 'controllers/reviewwebsite.controller.ts', 'extractData', ); methodLogger.debug('Extracting data from URL', { url, extractOptions }); try { const apiKey = getApiKey(options); const result = await reviewWebsiteService.extractData( url, extractOptions, apiKey, ); return { content: JSON.stringify(result, null, 2), }; } catch (error) { return handleControllerError(error, { entityType: 'Data', operation: 'extracting', source: 'controllers/reviewwebsite.controller.ts@extractData', additionalInfo: { url }, }); } }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mrgoonie/reviewwebsite-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server