Skip to main content
Glama
firecrawl

firecrawl-mcp-server

firecrawl_parse

Read-only

Parse local documents (PDF, Word, Excel, HTML) into markdown or structured JSON. Extract content with format selection and PDF-specific options.

Instructions

Parse a file using Firecrawl's /v2/parse endpoint.

In local/non-cloud MCP mode, this tool reads filePath from the MCP server filesystem and posts multipart data to the configured self-hosted FIRECRAWL_API_URL, preserving the existing direct-read behavior.

In hosted CLOUD_SERVICE mode, this tool is a two-call flow because hosted MCP cannot read your local filesystem:

  1. Call with filePath, contentType, parse options, and optional declaredSizeBytes. The hosted server mints a short-lived upload URL and returns a safe local curl PUT command plus nextToolCall.

  2. Run the returned curl command locally, then call firecrawl_parse again with uploadRef and the desired parse options. The hosted server calls /v2/parse server-side with your session credential.

Best for: Extracting content from a local document (PDF, Word, Excel, HTML, etc.); pulling structured data out of a file with JSON format; converting binary documents into markdown for downstream reasoning. Not recommended for: Remote URLs (use firecrawl_scrape); multiple files at once (call parse multiple times); documents that require interactive actions, screenshots, or change tracking — those aren't supported by the parse endpoint. Common mistakes: In hosted mode, do not pass both filePath and uploadRef. Phase 1 uses filePath only to generate upload instructions; phase 2 uses uploadRef only to parse server-side.

Supported file types: .html, .htm, .xhtml, .pdf, .docx, .doc, .odt, .rtf, .xlsx, .xls Unsupported options: actions, screenshot/branding/changeTracking formats, waitFor > 0, location, mobile, proxy values other than "auto" or "basic". Privacy: Set redactPII: true to return content with personally identifiable information redacted.

CRITICAL - Format Selection (same rules as firecrawl_scrape): When the user asks for SPECIFIC data points from a document, you MUST use JSON format with a schema. Only use markdown when the user needs the ENTIRE document content.

Handling PDFs: Add "parsers": ["pdf"] (optionally with pdfOptions.maxPages) when parsing a PDF so the PDF engine is invoked explicitly. For very long documents, cap maxPages to keep the response within token limits.

Hosted phase 1 example:

{
  "name": "firecrawl_parse",
  "arguments": {
    "filePath": "/absolute/path/to/document.pdf",
    "contentType": "application/pdf",
    "formats": ["markdown"],
    "parsers": ["pdf"],
    "zeroDataRetention": true
  }
}

Hosted phase 2 example:

{
  "name": "firecrawl_parse",
  "arguments": {
    "uploadRef": "upload-ref-from-phase-1",
    "formats": ["markdown"],
    "parsers": ["pdf"],
    "zeroDataRetention": true
  }
}

Returns: Phase 1 hosted upload instructions or a parsed document with markdown, html, links, summary, json, or query results depending on the requested formats.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
proxyNo
maxAgeNo
formatsNo
parsersNo
filePathYesAbsolute or relative path to a local file to parse. Supported: .html, .htm, .pdf, .docx, .doc, .odt, .rtf, .xlsx, .xls
redactPIINo
pdfOptionsNo
contentTypeNoOptional MIME type override. If omitted, the server infers the file kind from the extension.
excludeTagsNo
includeTagsNo
jsonOptionsNo
queryOptionsNo
storeInCacheNo
onlyMainContentNo
zeroDataRetentionNo
removeBase64ImagesNo
skipTlsVerificationNo
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses two operational modes (local/non-cloud and hosted CLOUD_SERVICE) with detailed two-call flow for hosted mode. Lists unsupported options, privacy settings, and format selection rules. Annotations indicate readOnlyHint=true, and description confirms read-only behavior, no contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is well-structured with sections (best for, not recommended, common mistakes, etc.) and front-loaded with purpose. However, it is verbose and could be more concise by trimming redundant explanations, such as repeating the two-call flow in multiple places.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given complexity (17 params, two modes, no output schema), description covers modes, examples, format selection, and limitations. It mentions return types (upload instructions or parsed document). Could add more detail on return structure or error handling, but overall sufficient for agent usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 12%, but description compensates by explaining key parameters like filePath, formats, parsers, and the two-phase usage. However, many parameters (proxy, maxAge, excludeTags, etc.) are not explained in detail. Examples cover critical scenarios, but systematic parameter documentation is lacking.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool parses a local file using Firecrawl's /v2/parse endpoint. It distinguishes from siblings by explicitly noting it is not for remote URLs (use firecrawl_scrape) and not for multiple files. Specific use cases are given (PDF, Word, Excel, HTML), and the verb 'Parse' is precise.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly instructs when to use (extracting content from local documents) and when not (remote URLs, multiple files, interactive documents). Names alternatives like firecrawl_scrape for remote URLs. Warns about common mistakes in hosted mode, such as not mixing filePath and uploadRef.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/firecrawl/firecrawl-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server