scrape

scrape

Extract content from a single webpage in formats like markdown, HTML, links, or screenshots. Configure extraction to focus on main content by including or excluding specific HTML tags.

Instructions

Scrape a single URL and extract content in various formats (markdown, html, links, screenshot). Use this for simple single-page scraping without AI agent capabilities.

Input Schema

TableJSON Schema

Name	Required	Description
`url`	Yes	The URL to scrape
`formats`	No	Output formats to return. Default: ["markdown"]. Can request multiple formats.
`onlyMainContent`	No	Extract only main content, removing headers, footers, nav, etc. Default: true
`includeTags`	No	HTML tags to include (e.g., ["article", "main"])
`excludeTags`	No	HTML tags to exclude (e.g., ["nav", "footer"])
`waitFor`	No	Milliseconds to wait before scraping (for JS rendering)
`timeout`	No	Request timeout in milliseconds

Implementation Reference

src/server.ts:330-386 (handler)
MCP tool handler for 'scrape': extracts parameters from args, calls firecrawl.scrape(), handles success/error and formats response as MCP content.
case 'scrape': { const { url, formats, onlyMainContent, includeTags, excludeTags, waitFor, timeout, } = args as { url: string; formats?: ('markdown' | 'html' | 'rawHtml' | 'links' | 'screenshot')[]; onlyMainContent?: boolean; includeTags?: string[]; excludeTags?: string[]; waitFor?: number; timeout?: number; }; const result = await firecrawl.scrape({ url, formats, onlyMainContent, includeTags, excludeTags, waitFor, timeout, }); if (!result.success) { return { content: [ { type: 'text', text: `Error: ${result.error}`, }, ], isError: true, }; } return { content: [ { type: 'text', text: JSON.stringify( { success: true, data: result.data, }, null, 2 ), }, ], }; }
src/server.ts:136-182 (registration)
Registration of the 'scrape' tool in the TOOLS array provided to ListToolsRequestHandler, defining name, description, and inputSchema.
{ name: 'scrape', description: 'Scrape a single URL and extract content in various formats (markdown, html, links, screenshot). Use this for simple single-page scraping without AI agent capabilities.', inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'The URL to scrape', }, formats: { type: 'array', items: { type: 'string', enum: ['markdown', 'html', 'rawHtml', 'links', 'screenshot'], }, description: 'Output formats to return. Default: ["markdown"]. Can request multiple formats.', }, onlyMainContent: { type: 'boolean', description: 'Extract only main content, removing headers, footers, nav, etc. Default: true', }, includeTags: { type: 'array', items: { type: 'string' }, description: 'HTML tags to include (e.g., ["article", "main"])', }, excludeTags: { type: 'array', items: { type: 'string' }, description: 'HTML tags to exclude (e.g., ["nav", "footer"])', }, waitFor: { type: 'number', description: 'Milliseconds to wait before scraping (for JS rendering)', }, timeout: { type: 'number', description: 'Request timeout in milliseconds', }, }, required: ['url'], }, },
src/types/firecrawl.ts:30-39 (schema)
TypeScript interfaces defining the input (FirecrawlScrapeRequest) and output (FirecrawlScrapeResponse) for the scrape operation, matching the tool inputSchema.
export interface FirecrawlScrapeRequest { url: string; formats?: ('markdown' | 'html' | 'rawHtml' | 'links' | 'screenshot')[]; onlyMainContent?: boolean; includeTags?: string[]; excludeTags?: string[]; headers?: Record<string, string>; waitFor?: number; timeout?: number; }
src/services/firecrawl-client.ts:139-169 (helper)
Core implementation of scrape: HTTP POST to Firecrawl API /v1/scrape endpoint with request params, handles response and errors.
async scrape(request: FirecrawlScrapeRequest): Promise<FirecrawlScrapeResponse> { try { const response = await fetch(`${this.apiBase}/v1/scrape`, { method: 'POST', headers: { 'Content-Type': 'application/json', 'Authorization': `Bearer ${this.apiKey}`, }, body: JSON.stringify(request), }); const data = await response.json() as any; if (!response.ok) { return { success: false, error: data.error || `HTTP ${response.status}: ${response.statusText}`, }; } return { success: true, data: data.data, }; } catch (error) { return { success: false, error: error instanceof Error ? error.message : 'Unknown error', }; } }

Firecrawl Agent MCP Server

Instructions

Input Schema

Implementation Reference

Other Tools

Latest Blog Posts

MCP directory API