one_scrape
Extract content from a single webpage in formats like HTML, markdown, or screenshots. Perform actions like clicking, scrolling, or executing JavaScript. Includes options for dynamic content, structured data extraction, and mobile viewport.
Instructions
Scrape a single webpage with advanced options for content extraction. Supports various formats including markdown, HTML, and screenshots. Can execute custom actions like clicking or scrolling before scraping.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| actions | No | List of actions to perform before scraping | |
| excludeTags | No | HTML tags to exclude from extraction | |
| extract | No | Configuration for structured data extraction | |
| formats | No | Content formats to extract (default: ['markdown']) | |
| includeTags | No | HTML tags to specifically include in extraction | |
| location | No | Location settings for scraping | |
| mobile | No | Use mobile viewport | |
| onlyMainContent | No | Extract only the main content, filtering out navigation, footers, etc. | |
| removeBase64Images | No | Remove base64 encoded images from output | |
| skipTlsVerification | No | Skip TLS certificate verification | |
| timeout | No | Maximum time in milliseconds to wait for the page to load | |
| url | Yes | The URL to scrape | |
| waitFor | No | Time in milliseconds to wait for dynamic content to load |
Implementation Reference
- src/index.ts:296-341 (handler)Core handler function that invokes Firecrawl's scrapeUrl API with provided parameters, processes various response formats (markdown, html, etc.), and returns formatted MCP content.async function processScrape(url: string, args: ScrapeParams) { const res = await firecrawl.scrapeUrl(url, { ...args, }); if (!res.success) { throw new Error(`Failed to scrape: ${res.error}`); } const content: string[] = []; if (res.markdown) { content.push(res.markdown); } if (res.rawHtml) { content.push(res.rawHtml); } if (res.links) { content.push(res.links.join('\n')); } if (res.screenshot) { content.push(res.screenshot); } if (res.html) { content.push(res.html); } if (res.extract) { content.push(res.extract); } return { content: [ { type: 'text', text: content.join('\n\n') || 'No content found', }, ], result: res, success: true, }; }
- src/index.ts:138-178 (handler)Dispatch handler in CallToolRequestSchema that validates input, logs progress, calls processScrape, and handles errors for the 'one_scrape' tool.case 'one_scrape': { if (!checkScrapeArgs(args)) { throw new Error(`Invalid arguments for tool: [${name}]`); } try { const startTime = Date.now(); server.sendLoggingMessage({ level: 'info', data: `[${new Date().toISOString()}] Scraping started for url: [${args.url}]`, }); const { url, ...scrapeArgs } = args; const { content, success, result } = await processScrape(url, scrapeArgs); server.sendLoggingMessage({ level: 'info', data: `[${new Date().toISOString()}] Scraping completed in ${Date.now() - startTime}ms`, }); return { content, result, success, }; } catch (error) { server.sendLoggingMessage({ level: 'error', data: `[${new Date().toISOString()}] Error scraping: ${error}`, }); const msg = error instanceof Error ? error.message : 'Unknown error'; return { success: false, content: [ { type: 'text', text: msg, }, ], }; } }
- src/tools.ts:97-251 (schema)Defines the Tool object for 'one_scrape' including detailed inputSchema with parameters for URL, formats, actions, extraction schemas, and more.export const SCRAPE_TOOL: Tool = { name: 'one_scrape', description: 'Scrape a single webpage with advanced options for content extraction. ' + 'Supports various formats including markdown, HTML, and screenshots. ' + 'Can execute custom actions like clicking or scrolling before scraping.', inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'The URL to scrape', }, formats: { type: 'array', items: { type: 'string', enum: [ 'markdown', 'html', 'rawHtml', 'screenshot', 'links', 'screenshot@fullPage', 'extract', ], }, description: "Content formats to extract (default: ['markdown'])", }, onlyMainContent: { type: 'boolean', description: 'Extract only the main content, filtering out navigation, footers, etc.', }, includeTags: { type: 'array', items: { type: 'string' }, description: 'HTML tags to specifically include in extraction', }, excludeTags: { type: 'array', items: { type: 'string' }, description: 'HTML tags to exclude from extraction', }, waitFor: { type: 'number', description: 'Time in milliseconds to wait for dynamic content to load', }, timeout: { type: 'number', description: 'Maximum time in milliseconds to wait for the page to load', }, actions: { type: 'array', items: { type: 'object', properties: { type: { type: 'string', enum: [ 'wait', 'click', 'screenshot', 'write', 'press', 'scroll', 'scrape', 'executeJavascript', ], description: 'Type of action to perform', }, selector: { type: 'string', description: 'CSS selector for the target element', }, milliseconds: { type: 'number', description: 'Time to wait in milliseconds (for wait action)', }, text: { type: 'string', description: 'Text to write (for write action)', }, key: { type: 'string', description: 'Key to press (for press action)', }, direction: { type: 'string', enum: ['up', 'down'], description: 'Scroll direction', }, script: { type: 'string', description: 'JavaScript code to execute', }, fullPage: { type: 'boolean', description: 'Take full page screenshot', }, }, required: ['type'], }, description: 'List of actions to perform before scraping', }, extract: { type: 'object', properties: { schema: { type: 'object', description: 'Schema for structured data extraction', }, systemPrompt: { type: 'string', description: 'System prompt for LLM extraction', }, prompt: { type: 'string', description: 'User prompt for LLM extraction', }, }, description: 'Configuration for structured data extraction', }, mobile: { type: 'boolean', description: 'Use mobile viewport', }, skipTlsVerification: { type: 'boolean', description: 'Skip TLS certificate verification', }, removeBase64Images: { type: 'boolean', description: 'Remove base64 encoded images from output', }, location: { type: 'object', properties: { country: { type: 'string', description: 'Country code for geolocation', }, languages: { type: 'array', items: { type: 'string' }, description: 'Language codes for content', }, }, description: 'Location settings for scraping', }, }, required: ['url'], }, };
- src/index.ts:66-73 (registration)Registers SCRAPE_TOOL (one_scrape) in the MCP server's ListTools response.server.setRequestHandler(ListToolsRequestSchema, async () => ({ tools: [ SEARCH_TOOL, EXTRACT_TOOL, SCRAPE_TOOL, MAP_TOOL, ], }));
- src/index.ts:377-384 (helper)Helper function to validate input arguments for the scrape tool, ensuring 'url' is present.function checkScrapeArgs(args: unknown): args is ScrapeParams & { url: string } { return ( typeof args === 'object' && args !== null && 'url' in args && typeof args.url === 'string' ); }