one_scrape

Extract content from a single webpage in formats like HTML, markdown, or screenshots. Perform actions like clicking, scrolling, or executing JavaScript. Includes options for dynamic content, structured data extraction, and mobile viewport.

Instructions

Scrape a single webpage with advanced options for content extraction. Supports various formats including markdown, HTML, and screenshots. Can execute custom actions like clicking or scrolling before scraping.

Input Schema

TableJSON Schema

Name	Required	Description
`actions`	No	List of actions to perform before scraping
`excludeTags`	No	HTML tags to exclude from extraction
`extract`	No	Configuration for structured data extraction
`formats`	No	Content formats to extract (default: ['markdown'])
`includeTags`	No	HTML tags to specifically include in extraction
`location`	No	Location settings for scraping
`mobile`	No	Use mobile viewport
`onlyMainContent`	No	Extract only the main content, filtering out navigation, footers, etc.
`removeBase64Images`	No	Remove base64 encoded images from output
`skipTlsVerification`	No	Skip TLS certificate verification
`timeout`	No	Maximum time in milliseconds to wait for the page to load
`url`	Yes	The URL to scrape
`waitFor`	No	Time in milliseconds to wait for dynamic content to load

Implementation Reference

src/index.ts:296-341 (handler)
Core handler function that invokes Firecrawl's scrapeUrl API with provided parameters, processes various response formats (markdown, html, etc.), and returns formatted MCP content.
async function processScrape(url: string, args: ScrapeParams) { const res = await firecrawl.scrapeUrl(url, { ...args, }); if (!res.success) { throw new Error(`Failed to scrape: ${res.error}`); } const content: string[] = []; if (res.markdown) { content.push(res.markdown); } if (res.rawHtml) { content.push(res.rawHtml); } if (res.links) { content.push(res.links.join('\n')); } if (res.screenshot) { content.push(res.screenshot); } if (res.html) { content.push(res.html); } if (res.extract) { content.push(res.extract); } return { content: [ { type: 'text', text: content.join('\n\n') || 'No content found', }, ], result: res, success: true, }; }
src/index.ts:138-178 (handler)
Dispatch handler in CallToolRequestSchema that validates input, logs progress, calls processScrape, and handles errors for the 'one_scrape' tool.
case 'one_scrape': { if (!checkScrapeArgs(args)) { throw new Error(`Invalid arguments for tool: [${name}]`); } try { const startTime = Date.now(); server.sendLoggingMessage({ level: 'info', data: `[${new Date().toISOString()}] Scraping started for url: [${args.url}]`, }); const { url, ...scrapeArgs } = args; const { content, success, result } = await processScrape(url, scrapeArgs); server.sendLoggingMessage({ level: 'info', data: `[${new Date().toISOString()}] Scraping completed in ${Date.now() - startTime}ms`, }); return { content, result, success, }; } catch (error) { server.sendLoggingMessage({ level: 'error', data: `[${new Date().toISOString()}] Error scraping: ${error}`, }); const msg = error instanceof Error ? error.message : 'Unknown error'; return { success: false, content: [ { type: 'text', text: msg, }, ], }; } }
src/tools.ts:97-251 (schema)
Defines the Tool object for 'one_scrape' including detailed inputSchema with parameters for URL, formats, actions, extraction schemas, and more.
export const SCRAPE_TOOL: Tool = { name: 'one_scrape', description: 'Scrape a single webpage with advanced options for content extraction. ' + 'Supports various formats including markdown, HTML, and screenshots. ' + 'Can execute custom actions like clicking or scrolling before scraping.', inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'The URL to scrape', }, formats: { type: 'array', items: { type: 'string', enum: [ 'markdown', 'html', 'rawHtml', 'screenshot', 'links', 'screenshot@fullPage', 'extract', ], }, description: "Content formats to extract (default: ['markdown'])", }, onlyMainContent: { type: 'boolean', description: 'Extract only the main content, filtering out navigation, footers, etc.', }, includeTags: { type: 'array', items: { type: 'string' }, description: 'HTML tags to specifically include in extraction', }, excludeTags: { type: 'array', items: { type: 'string' }, description: 'HTML tags to exclude from extraction', }, waitFor: { type: 'number', description: 'Time in milliseconds to wait for dynamic content to load', }, timeout: { type: 'number', description: 'Maximum time in milliseconds to wait for the page to load', }, actions: { type: 'array', items: { type: 'object', properties: { type: { type: 'string', enum: [ 'wait', 'click', 'screenshot', 'write', 'press', 'scroll', 'scrape', 'executeJavascript', ], description: 'Type of action to perform', }, selector: { type: 'string', description: 'CSS selector for the target element', }, milliseconds: { type: 'number', description: 'Time to wait in milliseconds (for wait action)', }, text: { type: 'string', description: 'Text to write (for write action)', }, key: { type: 'string', description: 'Key to press (for press action)', }, direction: { type: 'string', enum: ['up', 'down'], description: 'Scroll direction', }, script: { type: 'string', description: 'JavaScript code to execute', }, fullPage: { type: 'boolean', description: 'Take full page screenshot', }, }, required: ['type'], }, description: 'List of actions to perform before scraping', }, extract: { type: 'object', properties: { schema: { type: 'object', description: 'Schema for structured data extraction', }, systemPrompt: { type: 'string', description: 'System prompt for LLM extraction', }, prompt: { type: 'string', description: 'User prompt for LLM extraction', }, }, description: 'Configuration for structured data extraction', }, mobile: { type: 'boolean', description: 'Use mobile viewport', }, skipTlsVerification: { type: 'boolean', description: 'Skip TLS certificate verification', }, removeBase64Images: { type: 'boolean', description: 'Remove base64 encoded images from output', }, location: { type: 'object', properties: { country: { type: 'string', description: 'Country code for geolocation', }, languages: { type: 'array', items: { type: 'string' }, description: 'Language codes for content', }, }, description: 'Location settings for scraping', }, }, required: ['url'], }, };
src/index.ts:66-73 (registration)
Registers SCRAPE_TOOL (one_scrape) in the MCP server's ListTools response.
server.setRequestHandler(ListToolsRequestSchema, async () => ({ tools: [ SEARCH_TOOL, EXTRACT_TOOL, SCRAPE_TOOL, MAP_TOOL, ], }));
src/index.ts:377-384 (helper)
Helper function to validate input arguments for the scrape tool, ensuring 'url' is present.
function checkScrapeArgs(args: unknown): args is ScrapeParams & { url: string } { return ( typeof args === 'object' && args !== null && 'url' in args && typeof args.url === 'string' ); }

OneSearch MCP Server

one_scrape

Instructions

Input Schema

Implementation Reference

Other Tools

Related Tools

Latest Blog Posts

MCP directory API