firecrawl_extract

Instructions

Extract structured information from web pages using LLM capabilities. Supports both cloud AI and self-hosted LLM extraction.

Best for: Extracting specific structured data like prices, names, details from web pages. Not recommended for: When you need the full content of a page (use scrape); when you're not looking for specific structured data. Arguments:

urls: Array of URLs to extract information from
prompt: Custom prompt for the LLM extraction
systemPrompt: System prompt to guide the LLM
schema: JSON schema for structured data extraction
allowExternalLinks: Allow extraction from external links
enableWebSearch: Enable web search for additional context
includeSubdomains: Include subdomains in extraction Prompt Example: "Extract the product name, price, and description from these product pages." Usage Example:

{
  "name": "firecrawl_extract",
  "arguments": {
    "urls": ["https://example.com/page1", "https://example.com/page2"],
    "prompt": "Extract product information including name, price, and description",
    "systemPrompt": "You are a helpful assistant that extracts product information",
    "schema": {
      "type": "object",
      "properties": {
        "name": { "type": "string" },
        "price": { "type": "number" },
        "description": { "type": "string" }
      },
      "required": ["name", "price"]
    },
    "allowExternalLinks": false,
    "enableWebSearch": false,
    "includeSubdomains": false
  }
}

Returns: Extracted structured data as defined by your schema.

Input Schema

TableJSON Schema

Name	Required	Description
`urls`	Yes	List of URLs to extract information from
`prompt`	No	Prompt for the LLM extraction
`systemPrompt`	No	System prompt for LLM extraction
`schema`	No	JSON schema for structured data extraction
`allowExternalLinks`	No	Allow extraction from external links
`enableWebSearch`	No	Enable web search for additional context
`includeSubdomains`	No	Include subdomains in extraction

Implementation Reference

src/index.ts:1202-1260 (handler)

The handler implementation for firecrawl_extract, processing the request and calling the client.extract method.

case 'firecrawl_extract': {
  if (!isExtractOptions(args)) {
    throw new Error('Invalid arguments for firecrawl_extract');
  }

  try {
    const extractStartTime = Date.now();

    safeLog(
      'info',
      `Starting extraction for URLs: ${args.urls.join(', ')}`
    );

    // Log if using self-hosted instance
    if (FIRECRAWL_API_URL) {
      safeLog('info', 'Using self-hosted instance for extraction');
    }

    const extractResponse = await withRetry(
      async () =>
        client.extract(args.urls, {
          prompt: args.prompt,
          systemPrompt: args.systemPrompt,
          schema: args.schema,
          allowExternalLinks: args.allowExternalLinks,
          enableWebSearch: args.enableWebSearch,
          includeSubdomains: args.includeSubdomains,
          origin: 'mcp-server',
        } as ExtractParams),
      'extract operation'
    );

    // Type guard for successful response
    if (!('success' in extractResponse) || !extractResponse.success) {
      throw new Error(extractResponse.error || 'Extraction failed');
    }

    const response = extractResponse as ExtractResponse;

    // Log performance metrics
    safeLog(
      'info',
      `Extraction completed in ${Date.now() - extractStartTime}ms`
    );

    // Add warning to response if present
    const result = {
      content: [
        {
          type: 'text',
          text: trimResponseText(JSON.stringify(response.data, null, 2)),
        },
      ],
      isError: false,
    };

    if (response.warning) {
      safeLog('warning', response.warning);
    }

src/index.ts:511-535 (schema)

The tool definition (schema) for firecrawl_extract.

const EXTRACT_TOOL: Tool = {
  name: 'firecrawl_extract',
  description: `
Extract structured information from web pages using LLM capabilities. Supports both cloud AI and self-hosted LLM extraction.

**Best for:** Extracting specific structured data like prices, names, details from web pages.
**Not recommended for:** When you need the full content of a page (use scrape); when you're not looking for specific structured data.
**Arguments:**
- urls: Array of URLs to extract information from
- prompt: Custom prompt for the LLM extraction
- systemPrompt: System prompt to guide the LLM
- schema: JSON schema for structured data extraction
- allowExternalLinks: Allow extraction from external links
- enableWebSearch: Enable web search for additional context
- includeSubdomains: Include subdomains in extraction
**Prompt Example:** "Extract the product name, price, and description from these product pages."
**Usage Example:**
\`\`\`json
{
  "name": "firecrawl_extract",
  "arguments": {
    "urls": ["https://example.com/page1", "https://example.com/page2"],
    "prompt": "Extract product information including name, price, and description",
    "systemPrompt": "You are a helpful assistant that extracts product information",
    "schema": {

src/index.ts:830-837 (helper)

Type guard helper for validating arguments for firecrawl_extract.

function isExtractOptions(args: unknown): args is ExtractArgs {
  if (typeof args !== 'object' || args === null) return false;
  const { urls } = args as { urls?: unknown };
  return (
    Array.isArray(urls) &&
    urls.every((url): url is string => typeof url === 'string')
  );
}

Firecrawl MCP Server

Instructions

Input Schema

Implementation Reference

Tool Definition Quality

Other Tools

Latest Blog Posts

MCP directory API