scrape
Extract content from a single webpage in formats like markdown, HTML, links, or screenshots. Configure extraction to focus on main content by including or excluding specific HTML tags.
Instructions
Scrape a single URL and extract content in various formats (markdown, html, links, screenshot). Use this for simple single-page scraping without AI agent capabilities.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | The URL to scrape | |
| formats | No | Output formats to return. Default: ["markdown"]. Can request multiple formats. | |
| onlyMainContent | No | Extract only main content, removing headers, footers, nav, etc. Default: true | |
| includeTags | No | HTML tags to include (e.g., ["article", "main"]) | |
| excludeTags | No | HTML tags to exclude (e.g., ["nav", "footer"]) | |
| waitFor | No | Milliseconds to wait before scraping (for JS rendering) | |
| timeout | No | Request timeout in milliseconds |
Implementation Reference
- src/server.ts:330-386 (handler)MCP tool handler for 'scrape': extracts parameters from args, calls firecrawl.scrape(), handles success/error and formats response as MCP content.case 'scrape': { const { url, formats, onlyMainContent, includeTags, excludeTags, waitFor, timeout, } = args as { url: string; formats?: ('markdown' | 'html' | 'rawHtml' | 'links' | 'screenshot')[]; onlyMainContent?: boolean; includeTags?: string[]; excludeTags?: string[]; waitFor?: number; timeout?: number; }; const result = await firecrawl.scrape({ url, formats, onlyMainContent, includeTags, excludeTags, waitFor, timeout, }); if (!result.success) { return { content: [ { type: 'text', text: `Error: ${result.error}`, }, ], isError: true, }; } return { content: [ { type: 'text', text: JSON.stringify( { success: true, data: result.data, }, null, 2 ), }, ], }; }
- src/server.ts:136-182 (registration)Registration of the 'scrape' tool in the TOOLS array provided to ListToolsRequestHandler, defining name, description, and inputSchema.{ name: 'scrape', description: 'Scrape a single URL and extract content in various formats (markdown, html, links, screenshot). Use this for simple single-page scraping without AI agent capabilities.', inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'The URL to scrape', }, formats: { type: 'array', items: { type: 'string', enum: ['markdown', 'html', 'rawHtml', 'links', 'screenshot'], }, description: 'Output formats to return. Default: ["markdown"]. Can request multiple formats.', }, onlyMainContent: { type: 'boolean', description: 'Extract only main content, removing headers, footers, nav, etc. Default: true', }, includeTags: { type: 'array', items: { type: 'string' }, description: 'HTML tags to include (e.g., ["article", "main"])', }, excludeTags: { type: 'array', items: { type: 'string' }, description: 'HTML tags to exclude (e.g., ["nav", "footer"])', }, waitFor: { type: 'number', description: 'Milliseconds to wait before scraping (for JS rendering)', }, timeout: { type: 'number', description: 'Request timeout in milliseconds', }, }, required: ['url'], }, },
- src/types/firecrawl.ts:30-39 (schema)TypeScript interfaces defining the input (FirecrawlScrapeRequest) and output (FirecrawlScrapeResponse) for the scrape operation, matching the tool inputSchema.export interface FirecrawlScrapeRequest { url: string; formats?: ('markdown' | 'html' | 'rawHtml' | 'links' | 'screenshot')[]; onlyMainContent?: boolean; includeTags?: string[]; excludeTags?: string[]; headers?: Record<string, string>; waitFor?: number; timeout?: number; }
- Core implementation of scrape: HTTP POST to Firecrawl API /v1/scrape endpoint with request params, handles response and errors.async scrape(request: FirecrawlScrapeRequest): Promise<FirecrawlScrapeResponse> { try { const response = await fetch(`${this.apiBase}/v1/scrape`, { method: 'POST', headers: { 'Content-Type': 'application/json', 'Authorization': `Bearer ${this.apiKey}`, }, body: JSON.stringify(request), }); const data = await response.json() as any; if (!response.ok) { return { success: false, error: data.error || `HTTP ${response.status}: ${response.statusText}`, }; } return { success: true, data: data.data, }; } catch (error) { return { success: false, error: error instanceof Error ? error.message : 'Unknown error', }; } }