scrape
Extract webpage content including text, metadata, and optional markdown formatting for data collection and analysis.
Instructions
Tool to scrape a webpage and retrieve the text and, optionally, the markdown content. It will retrieve also the JSON-LD metadata and the head metadata.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | The URL of the webpage to scrape. | |
| includeMarkdown | No | Whether to include markdown content. |
Implementation Reference
- src/index.ts:239-252 (handler)Primary MCP server handler for the 'scrape' tool call. Extracts URL and includeMarkdown parameters, invokes the searchTools.scrape method, and returns the result as JSON text content.case "scrape": { const url = request.params.arguments?.url as string; const includeMarkdown = request.params.arguments ?.includeMarkdown as boolean; const result = await searchTools.scrape({ url, includeMarkdown }); return { content: [ { type: "text", text: JSON.stringify(result, null, 2), }, ], }; }
- src/index.ts:144-162 (registration)Registration of the 'scrape' tool in the ListToolsRequestSchema handler, including name, description, and input schema validation.name: "scrape", description: "Tool to scrape a webpage and retrieve the text and, optionally, the markdown content. It will retrieve also the JSON-LD metadata and the head metadata.", inputSchema: { type: "object", properties: { url: { type: "string", description: "The URL of the webpage to scrape.", }, includeMarkdown: { type: "boolean", description: "Whether to include markdown content.", default: false, }, }, required: ["url"], }, },
- src/types/serper.ts:52-55 (schema)TypeScript interface defining input parameters for scrape operation (matches MCP schema).export interface IScrapeParams { url: string; includeMarkdown?: boolean; }
- src/tools/search-tool.ts:46-53 (helper)Helper method in SerperSearchTools class that wraps the SerperClient.scrape call with error handling; invoked by MCP handler.async scrape(params: IScrapeParams): Promise<IScrapeResult> { try { const result = await this.serperClient.scrape(params); return result; } catch (error) { throw new Error(`SearchTool: failed to scrape. ${error}`); } }
- Core implementation of scrape in SerperClient: makes HTTP POST to scrape.serper.dev API with params, handles response and errors.async scrape(params: IScrapeParams): Promise<IScrapeResult> { if (!params.url) { throw new Error("URL is required for scraping"); } try { const response = await fetch("https://scrape.serper.dev", { method: "POST", headers: { "Content-Type": "application/json", "X-API-KEY": this.apiKey, }, body: JSON.stringify(params), redirect: "follow", }); if (!response.ok) { const errorText = await response.text(); throw new Error( `Serper API error: ${response.status} ${response.statusText} - ${errorText}` ); } const result = (await response.json()) as IScrapeResult; return result; } catch (error) { console.error(error); throw error; } }