firecrawl_scrape
Extract content from a single URL, returning formats like markdown, HTML, or structured data, with options for actions, tags, and dynamic content handling.
Instructions
Scrape content from a single URL with advanced options.
Best for: Single page content extraction, when you know exactly which page contains the information. Not recommended for: Multiple pages (use batch_scrape), unknown page (use search), structured data (use extract). Common mistakes: Using scrape for a list of URLs (use batch_scrape instead). Prompt Example: "Get the content of the page at https://example.com." Usage Example:
{
"name": "firecrawl_scrape",
"arguments": {
"url": "https://example.com",
"formats": ["markdown"]
}
}Returns: Markdown, HTML, or other formats as specified.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | The URL to scrape | |
| formats | No | Content formats to extract (default: ['markdown']) | |
| onlyMainContent | No | Extract only the main content, filtering out navigation, footers, etc. | |
| includeTags | No | HTML tags to specifically include in extraction | |
| excludeTags | No | HTML tags to exclude from extraction | |
| waitFor | No | Time in milliseconds to wait for dynamic content to load | |
| timeout | No | Maximum time in milliseconds to wait for the page to load | |
| actions | No | List of actions to perform before scraping | |
| extract | No | Configuration for structured data extraction | |
| mobile | No | Use mobile viewport | |
| skipTlsVerification | No | Skip TLS certificate verification | |
| removeBase64Images | No | Remove base64 encoded images from output | |
| location | No | Location settings for scraping |
Implementation Reference
- src/index.ts:995-1074 (handler)Handler for the firecrawl_scrape tool. Validates args, calls FirecrawlApp.scrapeUrl(), formats content based on requested formats (markdown, html, rawHtml, links, screenshot, extract), and returns the result.
case 'firecrawl_scrape': { if (!isScrapeOptions(args)) { throw new Error('Invalid arguments for firecrawl_scrape'); } const { url, ...options } = args; try { const scrapeStartTime = Date.now(); safeLog( 'info', `Starting scrape for URL: ${url} with options: ${JSON.stringify(options)}` ); const response = await client.scrapeUrl(url, { ...options, // @ts-expect-error Extended API options including origin origin: 'mcp-server', }); // Log performance metrics safeLog( 'info', `Scrape completed in ${Date.now() - scrapeStartTime}ms` ); if ('success' in response && !response.success) { throw new Error(response.error || 'Scraping failed'); } // Format content based on requested formats const contentParts = []; if (options.formats?.includes('markdown') && response.markdown) { contentParts.push(response.markdown); } if (options.formats?.includes('html') && response.html) { contentParts.push(response.html); } if (options.formats?.includes('rawHtml') && response.rawHtml) { contentParts.push(response.rawHtml); } if (options.formats?.includes('links') && response.links) { contentParts.push(response.links.join('\n')); } if (options.formats?.includes('screenshot') && response.screenshot) { contentParts.push(response.screenshot); } if (options.formats?.includes('extract') && response.extract) { contentParts.push(JSON.stringify(response.extract, null, 2)); } // If options.formats is empty, default to markdown if (!options.formats || options.formats.length === 0) { options.formats = ['markdown']; } // Add warning to response if present if (response.warning) { safeLog('warning', response.warning); } return { content: [ { type: 'text', text: trimResponseText( contentParts.join('\n\n') || 'No content available' ), }, ], isError: false, }; } catch (error) { const errorMessage = error instanceof Error ? error.message : String(error); return { content: [{ type: 'text', text: trimResponseText(errorMessage) }], isError: true, }; } } - src/index.ts:24-194 (schema)Tool definition with inputSchema for firecrawl_scrape, specifying url (required), formats, onlyMainContent, includeTags, excludeTags, waitFor, timeout, actions, extract, mobile, skipTlsVerification, removeBase64Images, and location.
const SCRAPE_TOOL: Tool = { name: 'firecrawl_scrape', description: ` Scrape content from a single URL with advanced options. **Best for:** Single page content extraction, when you know exactly which page contains the information. **Not recommended for:** Multiple pages (use batch_scrape), unknown page (use search), structured data (use extract). **Common mistakes:** Using scrape for a list of URLs (use batch_scrape instead). **Prompt Example:** "Get the content of the page at https://example.com." **Usage Example:** \`\`\`json { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com", "formats": ["markdown"] } } \`\`\` **Returns:** Markdown, HTML, or other formats as specified. `, inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'The URL to scrape', }, formats: { type: 'array', items: { type: 'string', enum: [ 'markdown', 'html', 'rawHtml', 'screenshot', 'links', 'screenshot@fullPage', 'extract', ], }, default: ['markdown'], description: "Content formats to extract (default: ['markdown'])", }, onlyMainContent: { type: 'boolean', description: 'Extract only the main content, filtering out navigation, footers, etc.', }, includeTags: { type: 'array', items: { type: 'string' }, description: 'HTML tags to specifically include in extraction', }, excludeTags: { type: 'array', items: { type: 'string' }, description: 'HTML tags to exclude from extraction', }, waitFor: { type: 'number', description: 'Time in milliseconds to wait for dynamic content to load', }, timeout: { type: 'number', description: 'Maximum time in milliseconds to wait for the page to load', }, actions: { type: 'array', items: { type: 'object', properties: { type: { type: 'string', enum: [ 'wait', 'click', 'screenshot', 'write', 'press', 'scroll', 'scrape', 'executeJavascript', ], description: 'Type of action to perform', }, selector: { type: 'string', description: 'CSS selector for the target element', }, milliseconds: { type: 'number', description: 'Time to wait in milliseconds (for wait action)', }, text: { type: 'string', description: 'Text to write (for write action)', }, key: { type: 'string', description: 'Key to press (for press action)', }, direction: { type: 'string', enum: ['up', 'down'], description: 'Scroll direction', }, script: { type: 'string', description: 'JavaScript code to execute', }, fullPage: { type: 'boolean', description: 'Take full page screenshot', }, }, required: ['type'], }, description: 'List of actions to perform before scraping', }, extract: { type: 'object', properties: { schema: { type: 'object', description: 'Schema for structured data extraction', }, systemPrompt: { type: 'string', description: 'System prompt for LLM extraction', }, prompt: { type: 'string', description: 'User prompt for LLM extraction', }, }, description: 'Configuration for structured data extraction', }, mobile: { type: 'boolean', description: 'Use mobile viewport', }, skipTlsVerification: { type: 'boolean', description: 'Skip TLS certificate verification', }, removeBase64Images: { type: 'boolean', description: 'Remove base64 encoded images from output', }, location: { type: 'object', properties: { country: { type: 'string', description: 'Country code for geolocation', }, languages: { type: 'array', items: { type: 'string' }, description: 'Language codes for content', }, }, description: 'Location settings for scraping', }, }, required: ['url'], }, }; - src/index.ts:955-966 (registration)Registration of SCRAPE_TOOL (with name 'firecrawl_scrape') in the ListToolsRequestSchema handler, exposing it as an available tool via MCP.
server.setRequestHandler(ListToolsRequestSchema, async () => ({ tools: [ SCRAPE_TOOL, MAP_TOOL, CRAWL_TOOL, CHECK_CRAWL_STATUS_TOOL, SEARCH_TOOL, EXTRACT_TOOL, DEEP_RESEARCH_TOOL, GENERATE_LLMSTXT_TOOL, ], })); - src/index.ts:776-785 (helper)Type guard isScrapeOptions() that validates arguments have a url string field before passing to the scrape handler.
function isScrapeOptions( args: unknown ): args is ScrapeParams & { url: string } { return ( typeof args === 'object' && args !== null && 'url' in args && typeof (args as { url: unknown }).url === 'string' ); } - src/index.ts:1463-1465 (helper)Utility function trimResponseText() used by the handler to trim whitespace from response content.
function trimResponseText(text: string): string { return text.trim(); }