firecrawl_scrape

Extract content from a single URL, returning formats like markdown, HTML, or structured data, with options for actions, tags, and dynamic content handling.

Instructions

Scrape content from a single URL with advanced options.

Best for: Single page content extraction, when you know exactly which page contains the information. Not recommended for: Multiple pages (use batch_scrape), unknown page (use search), structured data (use extract). Common mistakes: Using scrape for a list of URLs (use batch_scrape instead). Prompt Example: "Get the content of the page at https://example.com." Usage Example:

{
  "name": "firecrawl_scrape",
  "arguments": {
    "url": "https://example.com",
    "formats": ["markdown"]
  }
}

Returns: Markdown, HTML, or other formats as specified.

Input Schema

TableJSON Schema

Name	Required	Description
`url`	Yes	The URL to scrape
`formats`	No	Content formats to extract (default: ['markdown'])
`onlyMainContent`	No	Extract only the main content, filtering out navigation, footers, etc.
`includeTags`	No	HTML tags to specifically include in extraction
`excludeTags`	No	HTML tags to exclude from extraction
`waitFor`	No	Time in milliseconds to wait for dynamic content to load
`timeout`	No	Maximum time in milliseconds to wait for the page to load
`actions`	No	List of actions to perform before scraping
`extract`	No	Configuration for structured data extraction
`mobile`	No	Use mobile viewport
`skipTlsVerification`	No	Skip TLS certificate verification
`removeBase64Images`	No	Remove base64 encoded images from output
`location`	No	Location settings for scraping

Implementation Reference

src/index.ts:995-1074 (handler)

Handler for the firecrawl_scrape tool. Validates args, calls FirecrawlApp.scrapeUrl(), formats content based on requested formats (markdown, html, rawHtml, links, screenshot, extract), and returns the result.

case 'firecrawl_scrape': {
  if (!isScrapeOptions(args)) {
    throw new Error('Invalid arguments for firecrawl_scrape');
  }
  const { url, ...options } = args;
  try {
    const scrapeStartTime = Date.now();
    safeLog(
      'info',
      `Starting scrape for URL: ${url} with options: ${JSON.stringify(options)}`
    );

    const response = await client.scrapeUrl(url, {
      ...options,
      // @ts-expect-error Extended API options including origin
      origin: 'mcp-server',
    });

    // Log performance metrics
    safeLog(
      'info',
      `Scrape completed in ${Date.now() - scrapeStartTime}ms`
    );

    if ('success' in response && !response.success) {
      throw new Error(response.error || 'Scraping failed');
    }

    // Format content based on requested formats
    const contentParts = [];

    if (options.formats?.includes('markdown') && response.markdown) {
      contentParts.push(response.markdown);
    }
    if (options.formats?.includes('html') && response.html) {
      contentParts.push(response.html);
    }
    if (options.formats?.includes('rawHtml') && response.rawHtml) {
      contentParts.push(response.rawHtml);
    }
    if (options.formats?.includes('links') && response.links) {
      contentParts.push(response.links.join('\n'));
    }
    if (options.formats?.includes('screenshot') && response.screenshot) {
      contentParts.push(response.screenshot);
    }
    if (options.formats?.includes('extract') && response.extract) {
      contentParts.push(JSON.stringify(response.extract, null, 2));
    }

    // If options.formats is empty, default to markdown
    if (!options.formats || options.formats.length === 0) {
      options.formats = ['markdown'];
    }

    // Add warning to response if present
    if (response.warning) {
      safeLog('warning', response.warning);
    }

    return {
      content: [
        {
          type: 'text',
          text: trimResponseText(
            contentParts.join('\n\n') || 'No content available'
          ),
        },
      ],
      isError: false,
    };
  } catch (error) {
    const errorMessage =
      error instanceof Error ? error.message : String(error);
    return {
      content: [{ type: 'text', text: trimResponseText(errorMessage) }],
      isError: true,
    };
  }
}

src/index.ts:24-194 (schema)

Tool definition with inputSchema for firecrawl_scrape, specifying url (required), formats, onlyMainContent, includeTags, excludeTags, waitFor, timeout, actions, extract, mobile, skipTlsVerification, removeBase64Images, and location.

const SCRAPE_TOOL: Tool = {
  name: 'firecrawl_scrape',
  description: `
Scrape content from a single URL with advanced options.

**Best for:** Single page content extraction, when you know exactly which page contains the information.
**Not recommended for:** Multiple pages (use batch_scrape), unknown page (use search), structured data (use extract).
**Common mistakes:** Using scrape for a list of URLs (use batch_scrape instead).
**Prompt Example:** "Get the content of the page at https://example.com."
**Usage Example:**
\`\`\`json
{
  "name": "firecrawl_scrape",
  "arguments": {
    "url": "https://example.com",
    "formats": ["markdown"]
  }
}
\`\`\`
**Returns:** Markdown, HTML, or other formats as specified.
`,
  inputSchema: {
    type: 'object',
    properties: {
      url: {
        type: 'string',
        description: 'The URL to scrape',
      },
      formats: {
        type: 'array',
        items: {
          type: 'string',
          enum: [
            'markdown',
            'html',
            'rawHtml',
            'screenshot',
            'links',
            'screenshot@fullPage',
            'extract',
          ],
        },
        default: ['markdown'],
        description: "Content formats to extract (default: ['markdown'])",
      },
      onlyMainContent: {
        type: 'boolean',
        description:
          'Extract only the main content, filtering out navigation, footers, etc.',
      },
      includeTags: {
        type: 'array',
        items: { type: 'string' },
        description: 'HTML tags to specifically include in extraction',
      },
      excludeTags: {
        type: 'array',
        items: { type: 'string' },
        description: 'HTML tags to exclude from extraction',
      },
      waitFor: {
        type: 'number',
        description: 'Time in milliseconds to wait for dynamic content to load',
      },
      timeout: {
        type: 'number',
        description:
          'Maximum time in milliseconds to wait for the page to load',
      },
      actions: {
        type: 'array',
        items: {
          type: 'object',
          properties: {
            type: {
              type: 'string',
              enum: [
                'wait',
                'click',
                'screenshot',
                'write',
                'press',
                'scroll',
                'scrape',
                'executeJavascript',
              ],
              description: 'Type of action to perform',
            },
            selector: {
              type: 'string',
              description: 'CSS selector for the target element',
            },
            milliseconds: {
              type: 'number',
              description: 'Time to wait in milliseconds (for wait action)',
            },
            text: {
              type: 'string',
              description: 'Text to write (for write action)',
            },
            key: {
              type: 'string',
              description: 'Key to press (for press action)',
            },
            direction: {
              type: 'string',
              enum: ['up', 'down'],
              description: 'Scroll direction',
            },
            script: {
              type: 'string',
              description: 'JavaScript code to execute',
            },
            fullPage: {
              type: 'boolean',
              description: 'Take full page screenshot',
            },
          },
          required: ['type'],
        },
        description: 'List of actions to perform before scraping',
      },
      extract: {
        type: 'object',
        properties: {
          schema: {
            type: 'object',
            description: 'Schema for structured data extraction',
          },
          systemPrompt: {
            type: 'string',
            description: 'System prompt for LLM extraction',
          },
          prompt: {
            type: 'string',
            description: 'User prompt for LLM extraction',
          },
        },
        description: 'Configuration for structured data extraction',
      },
      mobile: {
        type: 'boolean',
        description: 'Use mobile viewport',
      },
      skipTlsVerification: {
        type: 'boolean',
        description: 'Skip TLS certificate verification',
      },
      removeBase64Images: {
        type: 'boolean',
        description: 'Remove base64 encoded images from output',
      },
      location: {
        type: 'object',
        properties: {
          country: {
            type: 'string',
            description: 'Country code for geolocation',
          },
          languages: {
            type: 'array',
            items: { type: 'string' },
            description: 'Language codes for content',
          },
        },
        description: 'Location settings for scraping',
      },
    },
    required: ['url'],
  },
};

src/index.ts:955-966 (registration)

Registration of SCRAPE_TOOL (with name 'firecrawl_scrape') in the ListToolsRequestSchema handler, exposing it as an available tool via MCP.

server.setRequestHandler(ListToolsRequestSchema, async () => ({
  tools: [
    SCRAPE_TOOL,
    MAP_TOOL,
    CRAWL_TOOL,
    CHECK_CRAWL_STATUS_TOOL,
    SEARCH_TOOL,
    EXTRACT_TOOL,
    DEEP_RESEARCH_TOOL,
    GENERATE_LLMSTXT_TOOL,
  ],
}));

src/index.ts:776-785 (helper)

Type guard isScrapeOptions() that validates arguments have a url string field before passing to the scrape handler.

function isScrapeOptions(
  args: unknown
): args is ScrapeParams & { url: string } {
  return (
    typeof args === 'object' &&
    args !== null &&
    'url' in args &&
    typeof (args as { url: unknown }).url === 'string'
  );
}

src/index.ts:1463-1465 (helper)
Utility function trimResponseText() used by the handler to trim whitespace from response content.
```
function trimResponseText(text: string): string {
  return text.trim();
}
```

Firecrawl MCP Server

firecrawl_scrape

Instructions

Input Schema

Implementation Reference

Tool Definition Quality

Other Tools

Latest Blog Posts

MCP directory API