Skip to main content
Glama

scrapeDeep

Extract comprehensive web content, including images, using deep scraping techniques with customizable parameters such as scroll depth, image size, and pagination. Output data to a specified directory for thorough analysis.

Instructions

Maximum extraction web scraping (slower but thorough)

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
downloadImagesNoWhether to download images locally
maxImagesNoMaximum number of images to extract
maxScrollsNoMaximum number of scroll attempts (default: 20)
minImageSizeNoMinimum width/height for images in pixels
outputNoOutput directory for downloaded images
pagesNoNumber of pages to scrape (if pagination is present)
scrapeImagesNoWhether to include images in the scrape result
scrollDelayNoDelay between scrolls in ms (default: 3000)
urlYesURL of the webpage to scrape

Implementation Reference

  • The main handler function for the scrapeDeep tool. It destructures parameters, sets deep scraping options, calls prysm.scrape, processes and limits the result to fit MCP constraints, and handles errors.
    handler: async (params: ScraperBaseParams): Promise<ScraperResponse> => { const { url, maxScrolls = 20, scrollDelay = 3000, pages = 1, scrapeImages = false, downloadImages = false, maxImages = 100, minImageSize = 100, output, imageOutput } = params; try { // Create options object for the scraper const options = { maxScrolls, scrollDelay, pages, focused: false, standard: false, deep: true, // Use deep mode for thorough extraction scrapeImages: scrapeImages || downloadImages, downloadImages, maxImages, minImageSize, output: output || config.serverOptions.defaultOutputDir, // Use configured default if not provided imageOutput: imageOutput || config.serverOptions.defaultImageOutputDir // Use configured default if not provided }; const result = await prysm.scrape(url, options) as ScraperResponse; // Limit content size to prevent overwhelming the MCP client if (result.content && result.content.length > 0) { // Limit the number of content sections if (result.content.length > 30) { result.content = result.content.slice(0, 30); result.content.push("(Content truncated due to size limitations)"); } // Limit the size of each content section result.content = result.content.map(section => { if (section.length > 10000) { return section.substring(0, 10000) + "... (truncated)"; } return section; }); } // Limit the number of images to return if (result.images && result.images.length > 30) { result.images = result.images.slice(0, 30); } return result; } catch (error) { console.error(`Error scraping ${url}:`, error); // Return a proper error format for MCP return { title: "Scraping Error", content: [`Failed to scrape ${url}: ${error instanceof Error ? error.message : String(error)}`], images: [], metadata: { error: true }, url: url, structureType: "error", paginationType: "none", extractionMethod: "none" }; } }
  • JSON Schema defining the input parameters for the scrapeDeep tool, including required 'url' and optional scraping options.
    parameters: { type: 'object', properties: { url: { type: 'string', description: 'URL of the webpage to scrape' }, maxScrolls: { type: 'number', description: 'Maximum number of scroll attempts (default: 20)' }, scrollDelay: { type: 'number', description: 'Delay between scrolls in ms (default: 3000)' }, pages: { type: 'number', description: 'Number of pages to scrape (if pagination is present)' }, scrapeImages: { type: 'boolean', description: 'Whether to include images in the scrape result' }, downloadImages: { type: 'boolean', description: 'Whether to download images locally' }, maxImages: { type: 'number', description: 'Maximum number of images to extract' }, minImageSize: { type: 'number', description: 'Minimum width/height for images in pixels' }, output: { type: 'string', description: 'Output directory for general results' }, imageOutput: { type: 'string', description: 'Output directory for downloaded images' } }, required: ['url'] },
  • src/config.ts:65-71 (registration)
    Registration of the scrapeDeep tool in the main MCP server configuration's tools array.
    tools: [ scrapeFocused, scrapeBalanced, scrapeDeep, // analyzeUrl, formatResult ],
  • Intermediate registration/export of tool definitions including scrapeDeep in tools/index.ts.
    export const toolDefinitions: ToolDefinition[] = [ scrapeFocused, scrapeBalanced, scrapeDeep, // analyzeUrl, formatResult, ];

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/pinkpixel-dev/prysm-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server