firecrawl_scrape

Extract webpage content in multiple formats like markdown or HTML, execute actions before scraping, and filter specific elements for precise data collection.

Instructions

Scrape a single webpage with advanced options for content extraction. Supports various formats including markdown, HTML, and screenshots. Can execute custom actions like clicking or scrolling before scraping.

Input Schema

TableJSON Schema

Name	Required	Description
`url`	Yes	The URL to scrape
`formats`	No	Content formats to extract (default: ['markdown'])
`onlyMainContent`	No	Extract only the main content, filtering out navigation, footers, etc.
`includeTags`	No	HTML tags to specifically include in extraction
`excludeTags`	No	HTML tags to exclude from extraction
`waitFor`	No	Time in milliseconds to wait for dynamic content to load
`timeout`	No	Maximum time in milliseconds to wait for the page to load
`actions`	No	List of actions to perform before scraping
`extract`	No	Configuration for structured data extraction
`mobile`	No	Use mobile viewport
`skipTlsVerification`	No	Skip TLS certificate verification
`removeBase64Images`	No	Remove base64 encoded images from output
`location`	No	Location settings for scraping

Implementation Reference

src/index.ts:991-1070 (handler)

The primary handler for the 'firecrawl_scrape' tool within the CallToolRequestSchema switch statement. Validates input using isScrapeOptions, invokes the Firecrawl client's scrapeUrl method with user-provided options, processes the response by extracting and formatting content according to specified formats (markdown, html, etc.), handles errors, logs performance metrics, and returns the formatted content.

case 'firecrawl_scrape': {
  if (!isScrapeOptions(args)) {
    throw new Error('Invalid arguments for firecrawl_scrape');
  }
  const { url, ...options } = args;
  try {
    const scrapeStartTime = Date.now();
    safeLog(
      'info',
      `Starting scrape for URL: ${url} with options: ${JSON.stringify(options)}`
    );

    const response = await client.scrapeUrl(url, {
      ...options,
      // @ts-expect-error Extended API options including origin
      origin: 'mcp-server',
    });

    // Log performance metrics
    safeLog(
      'info',
      `Scrape completed in ${Date.now() - scrapeStartTime}ms`
    );

    if ('success' in response && !response.success) {
      throw new Error(response.error || 'Scraping failed');
    }

    // Format content based on requested formats
    const contentParts = [];

    if (options.formats?.includes('markdown') && response.markdown) {
      contentParts.push(response.markdown);
    }
    if (options.formats?.includes('html') && response.html) {
      contentParts.push(response.html);
    }
    if (options.formats?.includes('rawHtml') && response.rawHtml) {
      contentParts.push(response.rawHtml);
    }
    if (options.formats?.includes('links') && response.links) {
      contentParts.push(response.links.join('\n'));
    }
    if (options.formats?.includes('screenshot') && response.screenshot) {
      contentParts.push(response.screenshot);
    }
    if (options.formats?.includes('extract') && response.extract) {
      contentParts.push(JSON.stringify(response.extract, null, 2));
    }

    // If options.formats is empty, default to markdown
    if (!options.formats || options.formats.length === 0) {
      options.formats = ['markdown'];
    }

    // Add warning to response if present
    if (response.warning) {
      safeLog('warning', response.warning);
    }

    return {
      content: [
        {
          type: 'text',
          text: trimResponseText(
            contentParts.join('\n\n') || 'No content available'
          ),
        },
      ],
      isError: false,
    };
  } catch (error) {
    const errorMessage =
      error instanceof Error ? error.message : String(error);
    return {
      content: [{ type: 'text', text: trimResponseText(errorMessage) }],
      isError: true,
    };
  }
}

src/index.ts:23-178 (schema)

Tool definition including name, detailed description, and comprehensive inputSchema specifying parameters like url (required), formats, actions, extraction schema, etc., for validating and documenting the firecrawl_scrape tool inputs.

const SCRAPE_TOOL: Tool = {
  name: 'firecrawl_scrape',
  description:
    'Scrape a single webpage with advanced options for content extraction. ' +
    'Supports various formats including markdown, HTML, and screenshots. ' +
    'Can execute custom actions like clicking or scrolling before scraping.',
  inputSchema: {
    type: 'object',
    properties: {
      url: {
        type: 'string',
        description: 'The URL to scrape',
      },
      formats: {
        type: 'array',
        items: {
          type: 'string',
          enum: [
            'markdown',
            'html',
            'rawHtml',
            'screenshot',
            'links',
            'screenshot@fullPage',
            'extract',
          ],
        },
        default: ['markdown'],
        description: "Content formats to extract (default: ['markdown'])",
      },
      onlyMainContent: {
        type: 'boolean',
        description:
          'Extract only the main content, filtering out navigation, footers, etc.',
      },
      includeTags: {
        type: 'array',
        items: { type: 'string' },
        description: 'HTML tags to specifically include in extraction',
      },
      excludeTags: {
        type: 'array',
        items: { type: 'string' },
        description: 'HTML tags to exclude from extraction',
      },
      waitFor: {
        type: 'number',
        description: 'Time in milliseconds to wait for dynamic content to load',
      },
      timeout: {
        type: 'number',
        description:
          'Maximum time in milliseconds to wait for the page to load',
      },
      actions: {
        type: 'array',
        items: {
          type: 'object',
          properties: {
            type: {
              type: 'string',
              enum: [
                'wait',
                'click',
                'screenshot',
                'write',
                'press',
                'scroll',
                'scrape',
                'executeJavascript',
              ],
              description: 'Type of action to perform',
            },
            selector: {
              type: 'string',
              description: 'CSS selector for the target element',
            },
            milliseconds: {
              type: 'number',
              description: 'Time to wait in milliseconds (for wait action)',
            },
            text: {
              type: 'string',
              description: 'Text to write (for write action)',
            },
            key: {
              type: 'string',
              description: 'Key to press (for press action)',
            },
            direction: {
              type: 'string',
              enum: ['up', 'down'],
              description: 'Scroll direction',
            },
            script: {
              type: 'string',
              description: 'JavaScript code to execute',
            },
            fullPage: {
              type: 'boolean',
              description: 'Take full page screenshot',
            },
          },
          required: ['type'],
        },
        description: 'List of actions to perform before scraping',
      },
      extract: {
        type: 'object',
        properties: {
          schema: {
            type: 'object',
            description: 'Schema for structured data extraction',
          },
          systemPrompt: {
            type: 'string',
            description: 'System prompt for LLM extraction',
          },
          prompt: {
            type: 'string',
            description: 'User prompt for LLM extraction',
          },
        },
        description: 'Configuration for structured data extraction',
      },
      mobile: {
        type: 'boolean',
        description: 'Use mobile viewport',
      },
      skipTlsVerification: {
        type: 'boolean',
        description: 'Skip TLS certificate verification',
      },
      removeBase64Images: {
        type: 'boolean',
        description: 'Remove base64 encoded images from output',
      },
      location: {
        type: 'object',
        properties: {
          country: {
            type: 'string',
            description: 'Country code for geolocation',
          },
          languages: {
            type: 'array',
            items: { type: 'string' },
            description: 'Language codes for content',
          },
        },
        description: 'Location settings for scraping',
      },
    },
    required: ['url'],
  },
};

src/index.ts:960-973 (registration)

Registration of the firecrawl_scrape tool (as SCRAPE_TOOL) in the list of available tools returned by the ListToolsRequestHandler.

server.setRequestHandler(ListToolsRequestSchema, async () => ({
  tools: [
    SCRAPE_TOOL,
    MAP_TOOL,
    CRAWL_TOOL,
    BATCH_SCRAPE_TOOL,
    CHECK_BATCH_STATUS_TOOL,
    CHECK_CRAWL_STATUS_TOOL,
    SEARCH_TOOL,
    EXTRACT_TOOL,
    DEEP_RESEARCH_TOOL,
    GENERATE_LLMSTXT_TOOL,
  ],
}));

src/index.ts:681-690 (helper)

Type guard function used in the handler to validate that the arguments match ScrapeParams with a required 'url' string, ensuring input validity before calling the Firecrawl API.

function isScrapeOptions(
  args: unknown
): args is ScrapeParams & { url: string } {
  return (
    typeof args === 'object' &&
    args !== null &&
    'url' in args &&
    typeof (args as { url: unknown }).url === 'string'
  );
}

Firecrawl MCP Server