firecrawl-mcp-server

Overview Schema Related Servers Score Discussions

firecrawl_parse

Read-only

Parse local documents (PDF, Word, Excel, HTML) into markdown or structured JSON. Extract content with format selection and PDF-specific options.

Instructions

Parse a file using Firecrawl's /v2/parse endpoint.

In local/non-cloud MCP mode, this tool reads filePath from the MCP server filesystem and posts multipart data to the configured self-hosted FIRECRAWL_API_URL, preserving the existing direct-read behavior.

In hosted CLOUD_SERVICE mode, this tool is a two-call flow because hosted MCP cannot read your local filesystem:

Call with filePath, contentType, parse options, and optional declaredSizeBytes. The hosted server mints a short-lived upload URL and returns a safe local curl PUT command plus nextToolCall.
Run the returned curl command locally, then call firecrawl_parse again with uploadRef and the desired parse options. The hosted server calls /v2/parse server-side with your session credential.

Best for: Extracting content from a local document (PDF, Word, Excel, HTML, etc.); pulling structured data out of a file with JSON format; converting binary documents into markdown for downstream reasoning. Not recommended for: Remote URLs (use firecrawl_scrape); multiple files at once (call parse multiple times); documents that require interactive actions, screenshots, or change tracking — those aren't supported by the parse endpoint. Common mistakes: In hosted mode, do not pass both filePath and uploadRef. Phase 1 uses filePath only to generate upload instructions; phase 2 uses uploadRef only to parse server-side.

Supported file types: .html, .htm, .xhtml, .pdf, .docx, .doc, .odt, .rtf, .xlsx, .xls Unsupported options: actions, screenshot/branding/changeTracking formats, waitFor > 0, location, mobile, proxy values other than "auto" or "basic". Privacy: Set redactPII: true to return content with personally identifiable information redacted.

CRITICAL - Format Selection (same rules as firecrawl_scrape): When the user asks for SPECIFIC data points from a document, you MUST use JSON format with a schema. Only use markdown when the user needs the ENTIRE document content.

Handling PDFs: Add "parsers": ["pdf"] (optionally with pdfOptions.maxPages) when parsing a PDF so the PDF engine is invoked explicitly. For very long documents, cap maxPages to keep the response within token limits.

Hosted phase 1 example:

{
  "name": "firecrawl_parse",
  "arguments": {
    "filePath": "/absolute/path/to/document.pdf",
    "contentType": "application/pdf",
    "formats": ["markdown"],
    "parsers": ["pdf"],
    "zeroDataRetention": true
  }
}

Hosted phase 2 example:

{
  "name": "firecrawl_parse",
  "arguments": {
    "uploadRef": "upload-ref-from-phase-1",
    "formats": ["markdown"],
    "parsers": ["pdf"],
    "zeroDataRetention": true
  }
}

Returns: Phase 1 hosted upload instructions or a parsed document with markdown, html, links, summary, json, or query results depending on the requested formats.

Input Schema

TableJSON Schema

Name	Required	Description
`proxy`	No
`maxAge`	No
`formats`	No
`parsers`	No
`filePath`	Yes	Absolute or relative path to a local file to parse. Supported: .html, .htm, .pdf, .docx, .doc, .odt, .rtf, .xlsx, .xls
`redactPII`	No
`pdfOptions`	No
`contentType`	No	Optional MIME type override. If omitted, the server infers the file kind from the extension.
`excludeTags`	No
`includeTags`	No
`jsonOptions`	No
`queryOptions`	No
`storeInCache`	No
`onlyMainContent`	No
`zeroDataRetention`	No
`removeBase64Images`	No
`skipTlsVerification`	No

Tool Definition Quality

A4.5/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses two operational modes (local/non-cloud and hosted CLOUD_SERVICE) with detailed two-call flow for hosted mode. Lists unsupported options, privacy settings, and format selection rules. Annotations indicate readOnlyHint=true, and description confirms read-only behavior, no contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is well-structured with sections (best for, not recommended, common mistakes, etc.) and front-loaded with purpose. However, it is verbose and could be more concise by trimming redundant explanations, such as repeating the two-call flow in multiple places.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given complexity (17 params, two modes, no output schema), description covers modes, examples, format selection, and limitations. It mentions return types (upload instructions or parsed document). Could add more detail on return structure or error handling, but overall sufficient for agent usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 12%, but description compensates by explaining key parameters like filePath, formats, parsers, and the two-phase usage. However, many parameters (proxy, maxAge, excludeTags, etc.) are not explained in detail. Examples cover critical scenarios, but systematic parameter documentation is lacking.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool parses a local file using Firecrawl's /v2/parse endpoint. It distinguishes from siblings by explicitly noting it is not for remote URLs (use firecrawl_scrape) and not for multiple files. Specific use cases are given (PDF, Word, Excel, HTML), and the verb 'Parse' is precise.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly instructs when to use (extracting content from local documents) and when not (remote URLs, multiple files, interactive documents). Names alternatives like firecrawl_scrape for remote URLs. Warns about common mistakes in hosted mode, such as not mixing filePath and uploadRef.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Latest Blog Posts

Who's Calling? MCP Hosts Are an Identity Blind Spot (And the Spec Knows It)
By Om-Shree-0709 on July 25, 2026.
mcp
Agent Identity
OAuth 2.1
Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/firecrawl/firecrawl-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server