Skip to main content
Glama
mcma123

Firecrawl MCP Server

by mcma123

firecrawl_scrape

Extract content from webpages in multiple formats, execute custom actions like clicking or scrolling, and filter specific elements for targeted data collection.

Instructions

Scrape a single webpage with advanced options for content extraction. Supports various formats including markdown, HTML, and screenshots. Can execute custom actions like clicking or scrolling before scraping.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesThe URL to scrape
formatsNoContent formats to extract (default: ['markdown'])
onlyMainContentNoExtract only the main content, filtering out navigation, footers, etc.
includeTagsNoHTML tags to specifically include in extraction
excludeTagsNoHTML tags to exclude from extraction
waitForNoTime in milliseconds to wait for dynamic content to load
timeoutNoMaximum time in milliseconds to wait for the page to load
actionsNoList of actions to perform before scraping
extractNoConfiguration for structured data extraction
mobileNoUse mobile viewport
skipTlsVerificationNoSkip TLS certificate verification
removeBase64ImagesNoRemove base64 encoded images from output
locationNoLocation settings for scraping

Implementation Reference

  • The switch case handler for the 'firecrawl_scrape' tool. Validates input using isScrapeOptions, calls the Firecrawl client's scrapeUrl method with URL and options, handles response formatting based on requested formats (markdown, html, etc.), logs performance and warnings, and returns formatted content or error.
    case 'firecrawl_scrape': {
      if (!isScrapeOptions(args)) {
        throw new Error('Invalid arguments for firecrawl_scrape');
      }
      const { url, ...options } = args;
      try {
        const scrapeStartTime = Date.now();
        server.sendLoggingMessage({
          level: 'info',
          data: `Starting scrape for URL: ${url} with options: ${JSON.stringify(
            options
          )}`,
        });
    
        const response = await client.scrapeUrl(url, options);
    
        // Log performance metrics
        server.sendLoggingMessage({
          level: 'info',
          data: `Scrape completed in ${Date.now() - scrapeStartTime}ms`,
        });
    
        if ('success' in response && !response.success) {
          throw new Error(response.error || 'Scraping failed');
        }
    
        
        // Format content based on requested formats
        const contentParts = [];
        
        if (options.formats?.includes('markdown') && response.markdown) {
          contentParts.push(response.markdown);
        }
        if (options.formats?.includes('html') && response.html) {
          contentParts.push(response.html); 
        }
        if (options.formats?.includes('rawHtml') && response.rawHtml) {
          contentParts.push(response.rawHtml);
        }
        if (options.formats?.includes('links') && response.links) {
          contentParts.push(response.links.join('\n'));
        }
        if (options.formats?.includes('screenshot') && response.screenshot) {
          contentParts.push(response.screenshot);
        }
        if (options.formats?.includes('extract') && response.extract) {
          contentParts.push(JSON.stringify(response.extract, null, 2));
        }
    
        // Add warning to response if present
        if (response.warning) {
          server.sendLoggingMessage({
            level: 'warning', 
            data: response.warning,
          });
        }
    
        return {
          content: [
            { type: 'text', text: contentParts.join('\n\n') || 'No content available' },
          ],
          isError: false,
        };
      } catch (error) {
        const errorMessage =
          error instanceof Error ? error.message : String(error);
        return {
          content: [{ type: 'text', text: errorMessage }],
          isError: true,
        };
      }
    }
  • The Tool object definition for 'firecrawl_scrape', including name, description, and comprehensive inputSchema defining parameters like url, formats, actions, extract config, etc., used for input validation.
    const SCRAPE_TOOL: Tool = {
      name: 'firecrawl_scrape',
      description:
        'Scrape a single webpage with advanced options for content extraction. ' +
        'Supports various formats including markdown, HTML, and screenshots. ' +
        'Can execute custom actions like clicking or scrolling before scraping.',
      inputSchema: {
        type: 'object',
        properties: {
          url: {
            type: 'string',
            description: 'The URL to scrape',
          },
          formats: {
            type: 'array',
            items: {
              type: 'string',
              enum: [
                'markdown',
                'html',
                'rawHtml',
                'screenshot',
                'links',
                'screenshot@fullPage',
                'extract',
              ],
            },
            description: "Content formats to extract (default: ['markdown'])",
          },
          onlyMainContent: {
            type: 'boolean',
            description:
              'Extract only the main content, filtering out navigation, footers, etc.',
          },
          includeTags: {
            type: 'array',
            items: { type: 'string' },
            description: 'HTML tags to specifically include in extraction',
          },
          excludeTags: {
            type: 'array',
            items: { type: 'string' },
            description: 'HTML tags to exclude from extraction',
          },
          waitFor: {
            type: 'number',
            description: 'Time in milliseconds to wait for dynamic content to load',
          },
          timeout: {
            type: 'number',
            description:
              'Maximum time in milliseconds to wait for the page to load',
          },
          actions: {
            type: 'array',
            items: {
              type: 'object',
              properties: {
                type: {
                  type: 'string',
                  enum: [
                    'wait',
                    'click',
                    'screenshot',
                    'write',
                    'press',
                    'scroll',
                    'scrape',
                    'executeJavascript',
                  ],
                  description: 'Type of action to perform',
                },
                selector: {
                  type: 'string',
                  description: 'CSS selector for the target element',
                },
                milliseconds: {
                  type: 'number',
                  description: 'Time to wait in milliseconds (for wait action)',
                },
                text: {
                  type: 'string',
                  description: 'Text to write (for write action)',
                },
                key: {
                  type: 'string',
                  description: 'Key to press (for press action)',
                },
                direction: {
                  type: 'string',
                  enum: ['up', 'down'],
                  description: 'Scroll direction',
                },
                script: {
                  type: 'string',
                  description: 'JavaScript code to execute',
                },
                fullPage: {
                  type: 'boolean',
                  description: 'Take full page screenshot',
                },
              },
              required: ['type'],
            },
            description: 'List of actions to perform before scraping',
          },
          extract: {
            type: 'object',
            properties: {
              schema: {
                type: 'object',
                description: 'Schema for structured data extraction',
              },
              systemPrompt: {
                type: 'string',
                description: 'System prompt for LLM extraction',
              },
              prompt: {
                type: 'string',
                description: 'User prompt for LLM extraction',
              },
            },
            description: 'Configuration for structured data extraction',
          },
          mobile: {
            type: 'boolean',
            description: 'Use mobile viewport',
          },
          skipTlsVerification: {
            type: 'boolean',
            description: 'Skip TLS certificate verification',
          },
          removeBase64Images: {
            type: 'boolean',
            description: 'Remove base64 encoded images from output',
          },
          location: {
            type: 'object',
            properties: {
              country: {
                type: 'string',
                description: 'Country code for geolocation',
              },
              languages: {
                type: 'array',
                items: { type: 'string' },
                description: 'Language codes for content',
              },
            },
            description: 'Location settings for scraping',
          },
        },
        required: ['url'],
      },
    };
  • src/index.ts:862-874 (registration)
    Registration of the 'firecrawl_scrape' tool (as SCRAPE_TOOL) in the listToolsRequestHandler, making it available via the MCP listTools capability.
    server.setRequestHandler(ListToolsRequestSchema, async () => ({
      tools: [
        SCRAPE_TOOL,
        MAP_TOOL,
        CRAWL_TOOL,
        BATCH_SCRAPE_TOOL,
        CHECK_BATCH_STATUS_TOOL,
        CHECK_CRAWL_STATUS_TOOL,
        SEARCH_TOOL,
        EXTRACT_TOOL,
        DEEP_RESEARCH_TOOL,
      ],
    }));
  • Type guard helper function 'isScrapeOptions' used in the handler to validate that arguments contain a valid 'url' string and conform to ScrapeParams.
    function isScrapeOptions(
      args: unknown
    ): args is ScrapeParams & { url: string } {
      return (
        typeof args === 'object' &&
        args !== null &&
        'url' in args &&
        typeof (args as { url: unknown }).url === 'string'
      );
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden for behavioral disclosure. While it mentions 'advanced options' and capabilities like executing actions before scraping, it lacks critical behavioral details: whether this is a read-only operation, potential rate limits, authentication requirements, error handling, or what happens with dynamic content. For a complex scraping tool with 13 parameters, this is a significant gap in transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately concise with two sentences that efficiently convey core functionality. The first sentence states the primary purpose, and the second adds key capabilities. There's no unnecessary repetition or fluff, though it could be slightly more structured by explicitly separating core scraping from advanced features.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex scraping tool with 13 parameters, no annotations, and no output schema, the description is inadequate. It doesn't explain what the tool returns (formats, structure, error cases), doesn't mention performance characteristics or limitations, and provides minimal guidance on the sophisticated parameter interactions. The agent would struggle to use this effectively without trial and error.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all 13 parameters thoroughly. The description adds minimal value beyond what's in the schema - it mentions 'advanced options' and 'various formats' but doesn't provide additional semantic context about parameter interactions or usage patterns. The baseline of 3 is appropriate when the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Scrape a single webpage with advanced options for content extraction.' It specifies the verb (scrape) and resource (webpage) and mentions advanced options. However, it doesn't explicitly differentiate from sibling tools like firecrawl_crawl or firecrawl_extract, which likely handle multi-page crawling or extraction-only operations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It mentions 'advanced options' but doesn't specify scenarios where this is preferable over simpler scraping methods or when to choose sibling tools like firecrawl_crawl for multi-page operations or firecrawl_extract for extraction-only tasks. No exclusions or prerequisites are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mcma123/firecrawl-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server