scrape_url
Extract HTML content from any URL by integrating with the ReviewWebsite API. Specify a URL, optional delay, and API key to retrieve web page data programmatically.
Instructions
Scrape a URL and return HTML content using ReviewWeb.site API.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| api_key | No | Your ReviewWebsite API key | |
| delayAfterLoad | No | Optional delay after page load in milliseconds | |
| url | Yes | The URL to scrape |
Input Schema (JSON Schema)
{
"$schema": "http://json-schema.org/draft-07/schema#",
"additionalProperties": false,
"properties": {
"api_key": {
"description": "Your ReviewWebsite API key",
"type": "string"
},
"delayAfterLoad": {
"description": "Optional delay after page load in milliseconds",
"type": "number"
},
"url": {
"description": "The URL to scrape",
"type": "string"
}
},
"required": [
"url"
],
"type": "object"
}
Implementation Reference
- src/tools/reviewwebsite.tool.ts:232-263 (handler)MCP tool handler function that receives args, calls the controller's scrapeUrl, formats the response as MCP content or error.async function handleScrapeUrl(args: ScrapeUrlToolArgsType) { const methodLogger = Logger.forContext( 'tools/reviewwebsite.tool.ts', 'handleScrapeUrl', ); methodLogger.debug(`Scraping URL with options:`, { ...args, api_key: args.api_key ? '[REDACTED]' : undefined, }); try { const result = await reviewWebsiteController.scrapeUrl( args.url, args.delayAfterLoad, { api_key: args.api_key, }, ); return { content: [ { type: 'text' as const, text: result.content, }, ], }; } catch (error) { methodLogger.error(`Error scraping URL`, error); return formatErrorForMcpTool(error); } }
- Zod schema defining the input parameters for the scrape_url tool: url (required), delayAfterLoad (optional number), api_key (optional string).export const ScrapeUrlToolArgs = z.object({ url: z.string().describe('The URL to scrape'), delayAfterLoad: z .number() .optional() .describe('Optional delay after page load in milliseconds'), api_key: z.string().optional().describe('Your ReviewWebsite API key'), });
- src/tools/reviewwebsite.tool.ts:745-750 (registration)Registration of the scrape_url tool with the MCP server, providing name, description, input schema, and handler reference.server.tool( 'scrape_url', `Scrape a URL and return HTML content using ReviewWeb.site API.`, ScrapeUrlToolArgs.shape, handleScrapeUrl, );
- Core service implementation that performs the HTTP POST request to the ReviewWebsite API /scrape endpoint to fetch HTML content.async function scrapeUrl( url: string, delayAfterLoad?: number, apiKey?: string, ): Promise<any> { const methodLogger = Logger.forContext( 'services/vendor.reviewwebsite.service.ts', 'scrapeUrl', ); try { methodLogger.debug('Scraping URL', { url, delayAfterLoad }); const params = new URLSearchParams(); params.append('url', url); const response = await axios.post( `${API_BASE}/scrape`, { options: { delayAfterLoad, }, }, { params, headers: getHeaders(apiKey), }, ); methodLogger.debug('Successfully scraped URL'); return response.data; } catch (error) { return handleApiError(error, 'scrapeUrl'); } }
- Controller layer function that resolves API key, calls the service, formats JSON response, and handles errors.async function scrapeUrl( url: string, delayAfterLoad?: number, options: ReviewWebsiteOptions = {}, ): Promise<ControllerResponse> { const methodLogger = Logger.forContext( 'controllers/reviewwebsite.controller.ts', 'scrapeUrl', ); methodLogger.debug('Scraping URL', { url, delayAfterLoad }); try { const apiKey = getApiKey(options); const result = await reviewWebsiteService.scrapeUrl( url, delayAfterLoad, apiKey, ); return { content: JSON.stringify(result, null, 2), }; } catch (error) { return handleControllerError(error, { entityType: 'URL', operation: 'scraping', source: 'controllers/reviewwebsite.controller.ts@scrapeUrl', additionalInfo: { url }, }); } }