Skip to main content
Glama

firecrawl_crawl

Crawl multiple web pages from a starting URL with depth control, path filtering, and webhook notifications for asynchronous data collection.

Instructions

Start an asynchronous crawl of multiple pages from a starting URL. Supports depth control, path filtering, and webhook notifications.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesStarting URL for the crawl
excludePathsNoURL paths to exclude from crawling
includePathsNoOnly crawl these URL paths
maxDepthNoMaximum link depth to crawl
ignoreSitemapNoSkip sitemap.xml discovery
limitNoMaximum number of pages to crawl
allowBackwardLinksNoAllow crawling links that point to parent directories
allowExternalLinksNoAllow crawling links to external domains
webhookNo
deduplicateSimilarURLsNoRemove similar URLs during crawl
ignoreQueryParametersNoIgnore query parameters when comparing URLs
scrapeOptionsNoOptions for scraping each page

Implementation Reference

  • Defines the Tool object including name, description, and detailed input schema for the firecrawl_crawl tool.
    const CRAWL_TOOL: Tool = { name: 'firecrawl_crawl', description: 'Start an asynchronous crawl of multiple pages from a starting URL. ' + 'Supports depth control, path filtering, and webhook notifications.', inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'Starting URL for the crawl', }, excludePaths: { type: 'array', items: { type: 'string' }, description: 'URL paths to exclude from crawling', }, includePaths: { type: 'array', items: { type: 'string' }, description: 'Only crawl these URL paths', }, maxDepth: { type: 'number', description: 'Maximum link depth to crawl', }, ignoreSitemap: { type: 'boolean', description: 'Skip sitemap.xml discovery', }, limit: { type: 'number', description: 'Maximum number of pages to crawl', }, allowBackwardLinks: { type: 'boolean', description: 'Allow crawling links that point to parent directories', }, allowExternalLinks: { type: 'boolean', description: 'Allow crawling links to external domains', }, webhook: { oneOf: [ { type: 'string', description: 'Webhook URL to notify when crawl is complete', }, { type: 'object', properties: { url: { type: 'string', description: 'Webhook URL', }, headers: { type: 'object', description: 'Custom headers for webhook requests', }, }, required: ['url'], }, ], }, deduplicateSimilarURLs: { type: 'boolean', description: 'Remove similar URLs during crawl', }, ignoreQueryParameters: { type: 'boolean', description: 'Ignore query parameters when comparing URLs', }, scrapeOptions: { type: 'object', properties: { formats: { type: 'array', items: { type: 'string', enum: [ 'markdown', 'html', 'rawHtml', 'screenshot', 'links', 'screenshot@fullPage', 'extract', ], }, }, onlyMainContent: { type: 'boolean', }, includeTags: { type: 'array', items: { type: 'string' }, }, excludeTags: { type: 'array', items: { type: 'string' }, }, waitFor: { type: 'number', }, }, description: 'Options for scraping each page', }, }, required: ['url'], }, };
  • src/index.ts:961-973 (registration)
    Registers the CRAWL_TOOL (firecrawl_crawl) in the list of available tools served by the MCP server.
    tools: [ SCRAPE_TOOL, MAP_TOOL, CRAWL_TOOL, BATCH_SCRAPE_TOOL, CHECK_BATCH_STATUS_TOOL, CHECK_CRAWL_STATUS_TOOL, SEARCH_TOOL, EXTRACT_TOOL, DEEP_RESEARCH_TOOL, GENERATE_LLMSTXT_TOOL, ], }));
  • Handler implementation that validates arguments, initiates asynchronous crawl using Firecrawl client, handles credits and errors, and returns the crawl job ID.
    case 'firecrawl_crawl': { if (!isCrawlOptions(args)) { throw new Error('Invalid arguments for firecrawl_crawl'); } const { url, ...options } = args; const response = await withRetry( async () => // @ts-expect-error Extended API options including origin client.asyncCrawlUrl(url, { ...options, origin: 'mcp-server' }), 'crawl operation' ); if (!response.success) { throw new Error(response.error); } // Monitor credits for cloud API if (!FIRECRAWL_API_URL && hasCredits(response)) { await updateCreditUsage(response.creditsUsed); } return { content: [ { type: 'text', text: trimResponseText( `Started crawl for ${url} with job ID: ${response.id}` ), }, ], isError: false, }; }
  • Type guard helper to validate that arguments match CrawlParams with required 'url' string.
    function isCrawlOptions(args: unknown): args is CrawlParams & { url: string } { return ( typeof args === 'object' && args !== null && 'url' in args && typeof (args as { url: unknown }).url === 'string' ); }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Krieg2065/firecrawl-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server