Skip to main content
Glama
Replicant-Partners

Firecrawl Agent MCP Server

scrape

Extract content from a single webpage in formats like markdown, HTML, links, or screenshots. Configure extraction to focus on main content by including or excluding specific HTML tags.

Instructions

Scrape a single URL and extract content in various formats (markdown, html, links, screenshot). Use this for simple single-page scraping without AI agent capabilities.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesThe URL to scrape
formatsNoOutput formats to return. Default: ["markdown"]. Can request multiple formats.
onlyMainContentNoExtract only main content, removing headers, footers, nav, etc. Default: true
includeTagsNoHTML tags to include (e.g., ["article", "main"])
excludeTagsNoHTML tags to exclude (e.g., ["nav", "footer"])
waitForNoMilliseconds to wait before scraping (for JS rendering)
timeoutNoRequest timeout in milliseconds

Implementation Reference

  • MCP tool handler for 'scrape': extracts parameters from args, calls firecrawl.scrape(), handles success/error and formats response as MCP content.
    case 'scrape': {
      const {
        url,
        formats,
        onlyMainContent,
        includeTags,
        excludeTags,
        waitFor,
        timeout,
      } = args as {
        url: string;
        formats?: ('markdown' | 'html' | 'rawHtml' | 'links' | 'screenshot')[];
        onlyMainContent?: boolean;
        includeTags?: string[];
        excludeTags?: string[];
        waitFor?: number;
        timeout?: number;
      };
    
      const result = await firecrawl.scrape({
        url,
        formats,
        onlyMainContent,
        includeTags,
        excludeTags,
        waitFor,
        timeout,
      });
    
      if (!result.success) {
        return {
          content: [
            {
              type: 'text',
              text: `Error: ${result.error}`,
            },
          ],
          isError: true,
        };
      }
    
      return {
        content: [
          {
            type: 'text',
            text: JSON.stringify(
              {
                success: true,
                data: result.data,
              },
              null,
              2
            ),
          },
        ],
      };
    }
  • src/server.ts:136-182 (registration)
    Registration of the 'scrape' tool in the TOOLS array provided to ListToolsRequestHandler, defining name, description, and inputSchema.
    {
      name: 'scrape',
      description:
        'Scrape a single URL and extract content in various formats (markdown, html, links, screenshot). Use this for simple single-page scraping without AI agent capabilities.',
      inputSchema: {
        type: 'object',
        properties: {
          url: {
            type: 'string',
            description: 'The URL to scrape',
          },
          formats: {
            type: 'array',
            items: {
              type: 'string',
              enum: ['markdown', 'html', 'rawHtml', 'links', 'screenshot'],
            },
            description:
              'Output formats to return. Default: ["markdown"]. Can request multiple formats.',
          },
          onlyMainContent: {
            type: 'boolean',
            description:
              'Extract only main content, removing headers, footers, nav, etc. Default: true',
          },
          includeTags: {
            type: 'array',
            items: { type: 'string' },
            description: 'HTML tags to include (e.g., ["article", "main"])',
          },
          excludeTags: {
            type: 'array',
            items: { type: 'string' },
            description: 'HTML tags to exclude (e.g., ["nav", "footer"])',
          },
          waitFor: {
            type: 'number',
            description: 'Milliseconds to wait before scraping (for JS rendering)',
          },
          timeout: {
            type: 'number',
            description: 'Request timeout in milliseconds',
          },
        },
        required: ['url'],
      },
    },
  • TypeScript interfaces defining the input (FirecrawlScrapeRequest) and output (FirecrawlScrapeResponse) for the scrape operation, matching the tool inputSchema.
    export interface FirecrawlScrapeRequest {
      url: string;
      formats?: ('markdown' | 'html' | 'rawHtml' | 'links' | 'screenshot')[];
      onlyMainContent?: boolean;
      includeTags?: string[];
      excludeTags?: string[];
      headers?: Record<string, string>;
      waitFor?: number;
      timeout?: number;
    }
  • Core implementation of scrape: HTTP POST to Firecrawl API /v1/scrape endpoint with request params, handles response and errors.
    async scrape(request: FirecrawlScrapeRequest): Promise<FirecrawlScrapeResponse> {
      try {
        const response = await fetch(`${this.apiBase}/v1/scrape`, {
          method: 'POST',
          headers: {
            'Content-Type': 'application/json',
            'Authorization': `Bearer ${this.apiKey}`,
          },
          body: JSON.stringify(request),
        });
    
        const data = await response.json() as any;
    
        if (!response.ok) {
          return {
            success: false,
            error: data.error || `HTTP ${response.status}: ${response.statusText}`,
          };
        }
    
        return {
          success: true,
          data: data.data,
        };
      } catch (error) {
        return {
          success: false,
          error: error instanceof Error ? error.message : 'Unknown error',
        };
      }
    }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Replicant-Partners/Firecrawler-MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server