extract_data
Extract structured JSON data from web pages using AI, guided by custom instructions and JSON templates. Integrates with ReviewWebsite API for precise data retrieval and formatting.
Instructions
Extract structured data (JSON) from a web page URL using AI via ReviewWeb.site API.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| api_key | No | Your ReviewWebsite API key | |
| debug | No | Enable debug mode for detailed logging | |
| delayAfterLoad | No | Optional delay after page load in milliseconds | |
| instructions | Yes | Instructions for the AI on what data to extract | |
| jsonTemplate | Yes | JSON template for structuring the extracted data | |
| model | No | AI model to use for extraction | |
| recursive | No | If true, recursively scrape all internal URLs and extract data from each | |
| systemPrompt | No | Optional system prompt to guide the AI | |
| url | Yes | The URL to extract data from |
Implementation Reference
- src/tools/reviewwebsite.tool.ts:731-735 (registration)Registration of the MCP tool 'extract_data' with description, input schema, and handler function.'extract_data', `Extract structured data (JSON) from a web page URL using AI via ReviewWeb.site API.`, ExtractDataToolArgs.shape, handleExtractData, );
- src/tools/reviewwebsite.tool.ts:137-176 (handler)The MCP handler function that processes tool calls to 'extract_data', delegates to controller, and formats MCP response.async function handleExtractData(args: ExtractDataToolArgsType) { const methodLogger = Logger.forContext( 'tools/reviewwebsite.tool.ts', 'handleExtractData', ); methodLogger.debug(`Extracting data from URL with options:`, { ...args, api_key: args.api_key ? '[REDACTED]' : undefined, }); try { const result = await reviewWebsiteController.extractData( args.url, { instructions: args.instructions, systemPrompt: args.systemPrompt, jsonTemplate: args.jsonTemplate, model: args.model, delayAfterLoad: args.delayAfterLoad, recursive: args.recursive, debug: args.debug, }, { api_key: args.api_key, }, ); return { content: [ { type: 'text' as const, text: result.content, }, ], }; } catch (error) { methodLogger.error(`Error extracting data from URL`, error); return formatErrorForMcpTool(error); } }
- Zod schema defining the input parameters for the 'extract_data' tool.export const ExtractDataToolArgs = z.object({ url: z.string().describe('The URL to extract data from'), instructions: z .string() .describe('Instructions for the AI on what data to extract'), jsonTemplate: z .string() .describe('JSON template for structuring the extracted data'), systemPrompt: z .string() .optional() .describe('Optional system prompt to guide the AI'), model: z.string().optional().describe('AI model to use for extraction'), delayAfterLoad: z .number() .optional() .describe('Optional delay after page load in milliseconds'), recursive: z .boolean() .optional() .describe( 'If true, recursively scrape all internal URLs and extract data from each', ), debug: z .boolean() .optional() .describe('Enable debug mode for detailed logging'), api_key: z.string().optional().describe('Your ReviewWebsite API key'), });
- Service-level implementation that performs the actual HTTP POST request to the ReviewWebsite API's /extract endpoint.async function extractData( url: string, options: ExtractDataOptions, apiKey?: string, ): Promise<any> { const methodLogger = Logger.forContext( 'services/vendor.reviewwebsite.service.ts', 'extractData', ); try { methodLogger.debug('Extracting data from URL', { url, options }); const response = await axios.post( `${API_BASE}/extract`, { url, options, }, { headers: getHeaders(apiKey), }, ); methodLogger.debug('Successfully extracted data from URL'); return response.data; } catch (error) { return handleApiError(error, 'extractData'); } }
- Controller function that orchestrates the service call, handles API key resolution, error handling, and response formatting.async function extractData( url: string, extractOptions: ExtractDataOptions, options: ReviewWebsiteOptions = {}, ): Promise<ControllerResponse> { const methodLogger = Logger.forContext( 'controllers/reviewwebsite.controller.ts', 'extractData', ); methodLogger.debug('Extracting data from URL', { url, extractOptions }); try { const apiKey = getApiKey(options); const result = await reviewWebsiteService.extractData( url, extractOptions, apiKey, ); return { content: JSON.stringify(result, null, 2), }; } catch (error) { return handleControllerError(error, { entityType: 'Data', operation: 'extracting', source: 'controllers/reviewwebsite.controller.ts@extractData', additionalInfo: { url }, }); } }