firecrawl_scrape
Extract content from webpages in multiple formats, execute custom actions like clicking or scrolling, and filter specific elements for targeted data collection.
Instructions
Scrape a single webpage with advanced options for content extraction. Supports various formats including markdown, HTML, and screenshots. Can execute custom actions like clicking or scrolling before scraping.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | The URL to scrape | |
| formats | No | Content formats to extract (default: ['markdown']) | |
| onlyMainContent | No | Extract only the main content, filtering out navigation, footers, etc. | |
| includeTags | No | HTML tags to specifically include in extraction | |
| excludeTags | No | HTML tags to exclude from extraction | |
| waitFor | No | Time in milliseconds to wait for dynamic content to load | |
| timeout | No | Maximum time in milliseconds to wait for the page to load | |
| actions | No | List of actions to perform before scraping | |
| extract | No | Configuration for structured data extraction | |
| mobile | No | Use mobile viewport | |
| skipTlsVerification | No | Skip TLS certificate verification | |
| removeBase64Images | No | Remove base64 encoded images from output | |
| location | No | Location settings for scraping |
Implementation Reference
- src/index.ts:892-963 (handler)The switch case handler for the 'firecrawl_scrape' tool. Validates input using isScrapeOptions, calls the Firecrawl client's scrapeUrl method with URL and options, handles response formatting based on requested formats (markdown, html, etc.), logs performance and warnings, and returns formatted content or error.case 'firecrawl_scrape': { if (!isScrapeOptions(args)) { throw new Error('Invalid arguments for firecrawl_scrape'); } const { url, ...options } = args; try { const scrapeStartTime = Date.now(); server.sendLoggingMessage({ level: 'info', data: `Starting scrape for URL: ${url} with options: ${JSON.stringify( options )}`, }); const response = await client.scrapeUrl(url, options); // Log performance metrics server.sendLoggingMessage({ level: 'info', data: `Scrape completed in ${Date.now() - scrapeStartTime}ms`, }); if ('success' in response && !response.success) { throw new Error(response.error || 'Scraping failed'); } // Format content based on requested formats const contentParts = []; if (options.formats?.includes('markdown') && response.markdown) { contentParts.push(response.markdown); } if (options.formats?.includes('html') && response.html) { contentParts.push(response.html); } if (options.formats?.includes('rawHtml') && response.rawHtml) { contentParts.push(response.rawHtml); } if (options.formats?.includes('links') && response.links) { contentParts.push(response.links.join('\n')); } if (options.formats?.includes('screenshot') && response.screenshot) { contentParts.push(response.screenshot); } if (options.formats?.includes('extract') && response.extract) { contentParts.push(JSON.stringify(response.extract, null, 2)); } // Add warning to response if present if (response.warning) { server.sendLoggingMessage({ level: 'warning', data: response.warning, }); } return { content: [ { type: 'text', text: contentParts.join('\n\n') || 'No content available' }, ], isError: false, }; } catch (error) { const errorMessage = error instanceof Error ? error.message : String(error); return { content: [{ type: 'text', text: errorMessage }], isError: true, }; } }
- src/index.ts:23-177 (schema)The Tool object definition for 'firecrawl_scrape', including name, description, and comprehensive inputSchema defining parameters like url, formats, actions, extract config, etc., used for input validation.const SCRAPE_TOOL: Tool = { name: 'firecrawl_scrape', description: 'Scrape a single webpage with advanced options for content extraction. ' + 'Supports various formats including markdown, HTML, and screenshots. ' + 'Can execute custom actions like clicking or scrolling before scraping.', inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'The URL to scrape', }, formats: { type: 'array', items: { type: 'string', enum: [ 'markdown', 'html', 'rawHtml', 'screenshot', 'links', 'screenshot@fullPage', 'extract', ], }, description: "Content formats to extract (default: ['markdown'])", }, onlyMainContent: { type: 'boolean', description: 'Extract only the main content, filtering out navigation, footers, etc.', }, includeTags: { type: 'array', items: { type: 'string' }, description: 'HTML tags to specifically include in extraction', }, excludeTags: { type: 'array', items: { type: 'string' }, description: 'HTML tags to exclude from extraction', }, waitFor: { type: 'number', description: 'Time in milliseconds to wait for dynamic content to load', }, timeout: { type: 'number', description: 'Maximum time in milliseconds to wait for the page to load', }, actions: { type: 'array', items: { type: 'object', properties: { type: { type: 'string', enum: [ 'wait', 'click', 'screenshot', 'write', 'press', 'scroll', 'scrape', 'executeJavascript', ], description: 'Type of action to perform', }, selector: { type: 'string', description: 'CSS selector for the target element', }, milliseconds: { type: 'number', description: 'Time to wait in milliseconds (for wait action)', }, text: { type: 'string', description: 'Text to write (for write action)', }, key: { type: 'string', description: 'Key to press (for press action)', }, direction: { type: 'string', enum: ['up', 'down'], description: 'Scroll direction', }, script: { type: 'string', description: 'JavaScript code to execute', }, fullPage: { type: 'boolean', description: 'Take full page screenshot', }, }, required: ['type'], }, description: 'List of actions to perform before scraping', }, extract: { type: 'object', properties: { schema: { type: 'object', description: 'Schema for structured data extraction', }, systemPrompt: { type: 'string', description: 'System prompt for LLM extraction', }, prompt: { type: 'string', description: 'User prompt for LLM extraction', }, }, description: 'Configuration for structured data extraction', }, mobile: { type: 'boolean', description: 'Use mobile viewport', }, skipTlsVerification: { type: 'boolean', description: 'Skip TLS certificate verification', }, removeBase64Images: { type: 'boolean', description: 'Remove base64 encoded images from output', }, location: { type: 'object', properties: { country: { type: 'string', description: 'Country code for geolocation', }, languages: { type: 'array', items: { type: 'string' }, description: 'Language codes for content', }, }, description: 'Location settings for scraping', }, }, required: ['url'], }, };
- src/index.ts:862-874 (registration)Registration of the 'firecrawl_scrape' tool (as SCRAPE_TOOL) in the listToolsRequestHandler, making it available via the MCP listTools capability.server.setRequestHandler(ListToolsRequestSchema, async () => ({ tools: [ SCRAPE_TOOL, MAP_TOOL, CRAWL_TOOL, BATCH_SCRAPE_TOOL, CHECK_BATCH_STATUS_TOOL, CHECK_CRAWL_STATUS_TOOL, SEARCH_TOOL, EXTRACT_TOOL, DEEP_RESEARCH_TOOL, ], }));
- src/index.ts:610-618 (helper)Type guard helper function 'isScrapeOptions' used in the handler to validate that arguments contain a valid 'url' string and conform to ScrapeParams.function isScrapeOptions( args: unknown ): args is ScrapeParams & { url: string } { return ( typeof args === 'object' && args !== null && 'url' in args && typeof (args as { url: unknown }).url === 'string' );