Skip to main content
Glama
Replicant-Partners

Firecrawl Agent MCP Server

scrape

Extract content from a single webpage in formats like markdown, HTML, links, or screenshots. Configure extraction to focus on main content by including or excluding specific HTML tags.

Instructions

Scrape a single URL and extract content in various formats (markdown, html, links, screenshot). Use this for simple single-page scraping without AI agent capabilities.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesThe URL to scrape
formatsNoOutput formats to return. Default: ["markdown"]. Can request multiple formats.
onlyMainContentNoExtract only main content, removing headers, footers, nav, etc. Default: true
includeTagsNoHTML tags to include (e.g., ["article", "main"])
excludeTagsNoHTML tags to exclude (e.g., ["nav", "footer"])
waitForNoMilliseconds to wait before scraping (for JS rendering)
timeoutNoRequest timeout in milliseconds

Implementation Reference

  • MCP tool handler for 'scrape': extracts parameters from args, calls firecrawl.scrape(), handles success/error and formats response as MCP content.
    case 'scrape': {
      const {
        url,
        formats,
        onlyMainContent,
        includeTags,
        excludeTags,
        waitFor,
        timeout,
      } = args as {
        url: string;
        formats?: ('markdown' | 'html' | 'rawHtml' | 'links' | 'screenshot')[];
        onlyMainContent?: boolean;
        includeTags?: string[];
        excludeTags?: string[];
        waitFor?: number;
        timeout?: number;
      };
    
      const result = await firecrawl.scrape({
        url,
        formats,
        onlyMainContent,
        includeTags,
        excludeTags,
        waitFor,
        timeout,
      });
    
      if (!result.success) {
        return {
          content: [
            {
              type: 'text',
              text: `Error: ${result.error}`,
            },
          ],
          isError: true,
        };
      }
    
      return {
        content: [
          {
            type: 'text',
            text: JSON.stringify(
              {
                success: true,
                data: result.data,
              },
              null,
              2
            ),
          },
        ],
      };
    }
  • src/server.ts:136-182 (registration)
    Registration of the 'scrape' tool in the TOOLS array provided to ListToolsRequestHandler, defining name, description, and inputSchema.
    {
      name: 'scrape',
      description:
        'Scrape a single URL and extract content in various formats (markdown, html, links, screenshot). Use this for simple single-page scraping without AI agent capabilities.',
      inputSchema: {
        type: 'object',
        properties: {
          url: {
            type: 'string',
            description: 'The URL to scrape',
          },
          formats: {
            type: 'array',
            items: {
              type: 'string',
              enum: ['markdown', 'html', 'rawHtml', 'links', 'screenshot'],
            },
            description:
              'Output formats to return. Default: ["markdown"]. Can request multiple formats.',
          },
          onlyMainContent: {
            type: 'boolean',
            description:
              'Extract only main content, removing headers, footers, nav, etc. Default: true',
          },
          includeTags: {
            type: 'array',
            items: { type: 'string' },
            description: 'HTML tags to include (e.g., ["article", "main"])',
          },
          excludeTags: {
            type: 'array',
            items: { type: 'string' },
            description: 'HTML tags to exclude (e.g., ["nav", "footer"])',
          },
          waitFor: {
            type: 'number',
            description: 'Milliseconds to wait before scraping (for JS rendering)',
          },
          timeout: {
            type: 'number',
            description: 'Request timeout in milliseconds',
          },
        },
        required: ['url'],
      },
    },
  • TypeScript interfaces defining the input (FirecrawlScrapeRequest) and output (FirecrawlScrapeResponse) for the scrape operation, matching the tool inputSchema.
    export interface FirecrawlScrapeRequest {
      url: string;
      formats?: ('markdown' | 'html' | 'rawHtml' | 'links' | 'screenshot')[];
      onlyMainContent?: boolean;
      includeTags?: string[];
      excludeTags?: string[];
      headers?: Record<string, string>;
      waitFor?: number;
      timeout?: number;
    }
  • Core implementation of scrape: HTTP POST to Firecrawl API /v1/scrape endpoint with request params, handles response and errors.
    async scrape(request: FirecrawlScrapeRequest): Promise<FirecrawlScrapeResponse> {
      try {
        const response = await fetch(`${this.apiBase}/v1/scrape`, {
          method: 'POST',
          headers: {
            'Content-Type': 'application/json',
            'Authorization': `Bearer ${this.apiKey}`,
          },
          body: JSON.stringify(request),
        });
    
        const data = await response.json() as any;
    
        if (!response.ok) {
          return {
            success: false,
            error: data.error || `HTTP ${response.status}: ${response.statusText}`,
          };
        }
    
        return {
          success: true,
          data: data.data,
        };
      } catch (error) {
        return {
          success: false,
          error: error instanceof Error ? error.message : 'Unknown error',
        };
      }
    }

Tool Definition Quality

Score is being calculated. Check back soon.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Replicant-Partners/Firecrawler-MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server