Skip to main content
Glama

extract_data

Extract structured JSON data from web pages using AI, guided by custom instructions and JSON templates. Integrates with ReviewWebsite API for precise data retrieval and formatting.

Instructions

Extract structured data (JSON) from a web page URL using AI via ReviewWeb.site API.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
api_keyNoYour ReviewWebsite API key
debugNoEnable debug mode for detailed logging
delayAfterLoadNoOptional delay after page load in milliseconds
instructionsYesInstructions for the AI on what data to extract
jsonTemplateYesJSON template for structuring the extracted data
modelNoAI model to use for extraction
recursiveNoIf true, recursively scrape all internal URLs and extract data from each
systemPromptNoOptional system prompt to guide the AI
urlYesThe URL to extract data from

Implementation Reference

  • Registration of the MCP tool 'extract_data' with description, input schema, and handler function.
    	'extract_data',
    	`Extract structured data (JSON) from a web page URL using AI via ReviewWeb.site API.`,
    	ExtractDataToolArgs.shape,
    	handleExtractData,
    );
  • The MCP handler function that processes tool calls to 'extract_data', delegates to controller, and formats MCP response.
    async function handleExtractData(args: ExtractDataToolArgsType) {
    	const methodLogger = Logger.forContext(
    		'tools/reviewwebsite.tool.ts',
    		'handleExtractData',
    	);
    	methodLogger.debug(`Extracting data from URL with options:`, {
    		...args,
    		api_key: args.api_key ? '[REDACTED]' : undefined,
    	});
    
    	try {
    		const result = await reviewWebsiteController.extractData(
    			args.url,
    			{
    				instructions: args.instructions,
    				systemPrompt: args.systemPrompt,
    				jsonTemplate: args.jsonTemplate,
    				model: args.model,
    				delayAfterLoad: args.delayAfterLoad,
    				recursive: args.recursive,
    				debug: args.debug,
    			},
    			{
    				api_key: args.api_key,
    			},
    		);
    
    		return {
    			content: [
    				{
    					type: 'text' as const,
    					text: result.content,
    				},
    			],
    		};
    	} catch (error) {
    		methodLogger.error(`Error extracting data from URL`, error);
    		return formatErrorForMcpTool(error);
    	}
    }
  • Zod schema defining the input parameters for the 'extract_data' tool.
    export const ExtractDataToolArgs = z.object({
    	url: z.string().describe('The URL to extract data from'),
    	instructions: z
    		.string()
    		.describe('Instructions for the AI on what data to extract'),
    	jsonTemplate: z
    		.string()
    		.describe('JSON template for structuring the extracted data'),
    	systemPrompt: z
    		.string()
    		.optional()
    		.describe('Optional system prompt to guide the AI'),
    	model: z.string().optional().describe('AI model to use for extraction'),
    	delayAfterLoad: z
    		.number()
    		.optional()
    		.describe('Optional delay after page load in milliseconds'),
    	recursive: z
    		.boolean()
    		.optional()
    		.describe(
    			'If true, recursively scrape all internal URLs and extract data from each',
    		),
    	debug: z
    		.boolean()
    		.optional()
    		.describe('Enable debug mode for detailed logging'),
    	api_key: z.string().optional().describe('Your ReviewWebsite API key'),
    });
  • Service-level implementation that performs the actual HTTP POST request to the ReviewWebsite API's /extract endpoint.
    async function extractData(
    	url: string,
    	options: ExtractDataOptions,
    	apiKey?: string,
    ): Promise<any> {
    	const methodLogger = Logger.forContext(
    		'services/vendor.reviewwebsite.service.ts',
    		'extractData',
    	);
    
    	try {
    		methodLogger.debug('Extracting data from URL', { url, options });
    
    		const response = await axios.post(
    			`${API_BASE}/extract`,
    			{
    				url,
    				options,
    			},
    			{
    				headers: getHeaders(apiKey),
    			},
    		);
    
    		methodLogger.debug('Successfully extracted data from URL');
    		return response.data;
    	} catch (error) {
    		return handleApiError(error, 'extractData');
    	}
    }
  • Controller function that orchestrates the service call, handles API key resolution, error handling, and response formatting.
    async function extractData(
    	url: string,
    	extractOptions: ExtractDataOptions,
    	options: ReviewWebsiteOptions = {},
    ): Promise<ControllerResponse> {
    	const methodLogger = Logger.forContext(
    		'controllers/reviewwebsite.controller.ts',
    		'extractData',
    	);
    
    	methodLogger.debug('Extracting data from URL', { url, extractOptions });
    
    	try {
    		const apiKey = getApiKey(options);
    		const result = await reviewWebsiteService.extractData(
    			url,
    			extractOptions,
    			apiKey,
    		);
    
    		return {
    			content: JSON.stringify(result, null, 2),
    		};
    	} catch (error) {
    		return handleControllerError(error, {
    			entityType: 'Data',
    			operation: 'extracting',
    			source: 'controllers/reviewwebsite.controller.ts@extractData',
    			additionalInfo: { url },
    		});
    	}
    }
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It mentions 'using AI' but doesn't disclose behavioral traits like rate limits, authentication requirements (though api_key is in schema), what happens with recursive extraction, error handling, or output format details. The description is minimal and lacks operational context needed for a tool with 9 parameters.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that clearly states the core functionality. It's appropriately sized and front-loaded with the essential information. There's no wasted verbiage or unnecessary elaboration.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex tool with 9 parameters, no annotations, and no output schema, the description is inadequate. It doesn't explain what the extracted JSON looks like, how the AI extraction works, what happens with recursive mode, or provide any context about the ReviewWeb.site API. The description leaves too many operational questions unanswered for proper agent usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all 9 parameters thoroughly. The description doesn't add any meaningful parameter semantics beyond what's in the schema - it doesn't explain relationships between parameters like how 'instructions' and 'jsonTemplate' work together, or provide usage examples. Baseline 3 is appropriate when schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Extract structured data (JSON) from a web page URL using AI via ReviewWeb.site API.' It specifies the action (extract), resource (structured data from web page URL), and method (AI via specific API). However, it doesn't explicitly differentiate from sibling tools like 'scrape_url' or 'extract_links' which might have overlapping functionality.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. With multiple sibling tools like 'scrape_url', 'extract_links', and 'extract_data_multiple', there's no indication of when this specific extraction tool is appropriate, what distinguishes it, or any prerequisites for use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mrgoonie/reviewwebsite-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server