firecrawl_parse
Parse local documents (PDF, Word, Excel, HTML) into markdown or structured JSON. Extract content with format selection and PDF-specific options.
Instructions
Parse a file using Firecrawl's /v2/parse endpoint.
In local/non-cloud MCP mode, this tool reads filePath from the MCP server filesystem and posts multipart data to the configured self-hosted FIRECRAWL_API_URL, preserving the existing direct-read behavior.
In hosted CLOUD_SERVICE mode, this tool is a two-call flow because hosted MCP cannot read your local filesystem:
Call with filePath, contentType, parse options, and optional declaredSizeBytes. The hosted server mints a short-lived upload URL and returns a safe local curl PUT command plus nextToolCall.
Run the returned curl command locally, then call firecrawl_parse again with uploadRef and the desired parse options. The hosted server calls /v2/parse server-side with your session credential.
Best for: Extracting content from a local document (PDF, Word, Excel, HTML, etc.); pulling structured data out of a file with JSON format; converting binary documents into markdown for downstream reasoning. Not recommended for: Remote URLs (use firecrawl_scrape); multiple files at once (call parse multiple times); documents that require interactive actions, screenshots, or change tracking — those aren't supported by the parse endpoint. Common mistakes: In hosted mode, do not pass both filePath and uploadRef. Phase 1 uses filePath only to generate upload instructions; phase 2 uses uploadRef only to parse server-side.
Supported file types: .html, .htm, .xhtml, .pdf, .docx, .doc, .odt, .rtf, .xlsx, .xls
Unsupported options: actions, screenshot/branding/changeTracking formats, waitFor > 0, location, mobile, proxy values other than "auto" or "basic".
Privacy: Set redactPII: true to return content with personally identifiable information redacted.
CRITICAL - Format Selection (same rules as firecrawl_scrape): When the user asks for SPECIFIC data points from a document, you MUST use JSON format with a schema. Only use markdown when the user needs the ENTIRE document content.
Handling PDFs:
Add "parsers": ["pdf"] (optionally with pdfOptions.maxPages) when parsing a PDF so the PDF engine is invoked explicitly. For very long documents, cap maxPages to keep the response within token limits.
Hosted phase 1 example:
{
"name": "firecrawl_parse",
"arguments": {
"filePath": "/absolute/path/to/document.pdf",
"contentType": "application/pdf",
"formats": ["markdown"],
"parsers": ["pdf"],
"zeroDataRetention": true
}
}Hosted phase 2 example:
{
"name": "firecrawl_parse",
"arguments": {
"uploadRef": "upload-ref-from-phase-1",
"formats": ["markdown"],
"parsers": ["pdf"],
"zeroDataRetention": true
}
}Returns: Phase 1 hosted upload instructions or a parsed document with markdown, html, links, summary, json, or query results depending on the requested formats.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| proxy | No | ||
| maxAge | No | ||
| formats | No | ||
| parsers | No | ||
| filePath | Yes | Absolute or relative path to a local file to parse. Supported: .html, .htm, .pdf, .docx, .doc, .odt, .rtf, .xlsx, .xls | |
| redactPII | No | ||
| pdfOptions | No | ||
| contentType | No | Optional MIME type override. If omitted, the server infers the file kind from the extension. | |
| excludeTags | No | ||
| includeTags | No | ||
| jsonOptions | No | ||
| queryOptions | No | ||
| storeInCache | No | ||
| onlyMainContent | No | ||
| zeroDataRetention | No | ||
| removeBase64Images | No | ||
| skipTlsVerification | No |