convert_pdf_to_markdown
Convert PDF files to markdown format for LLM processing, extracting text and saving images from documents.
Instructions
Converts a PDF file to markdown format via pymupdf4llm. See pymupdf.readthedocs.io/en/latest/pymupdf4llm for more. The file_path, image_path, and save_path parameters should be the absolute path to the PDF file, not a relative path. This tool will also convert the PDF to images and save them in the image_path directory. For larger PDF files, use save_path to save the markdown file then read it partially.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| file_path | Yes | Absolute path to the PDF file to convert | |
| image_path | No | Optional. Absolute path to the directory to save the images. If not provided, the images will be saved in the same directory as the PDF file. | |
| save_path | No | Optional. Absolute path to the directory to save the markdown file. If provided, will return the path to the markdown file. If not provided, will return the markdown string. |
Implementation Reference
- pymupdf4llm_mcp/app.py:20-76 (handler)The handler function that executes the convert_pdf_to_markdown tool. It validates paths, converts PDF to markdown using pymupdf4llm.to_markdown, handles image saving, optional markdown file saving, truncation for long content, and error handling.def convert_pdf_to_markdown( file_path: Annotated[str, Field(description="Absolute path to the PDF file to convert")], image_path: Annotated[ str | None, Field( description="Optional. Absolute path to the directory to save the images. " "If not provided, the images will be saved in the same directory as the PDF file." ), ] = None, save_path: Annotated[ str | None, Field( description="Optional. Absolute path to the directory to save the markdown file. " "If provided, will return the path to the markdown file. " "If not provided, will return the markdown string." ), ] = None, ) -> dict[str, Any]: file_path: Path = Path(file_path).expanduser().resolve() if not file_path.exists(): return { "error": f"File not found: {file_path}", "success": False, } image_path = Path(image_path).expanduser().resolve() if image_path else file_path.parent try: content = pymupdf4llm.to_markdown(file_path, write_images=True, image_path=image_path.as_posix()) if save_path: save_path: Path = Path(save_path).expanduser().resolve() save_path.parent.mkdir(parents=True, exist_ok=True) content = pymupdf4llm.to_markdown(file_path, write_images=True, image_path=image_path.as_posix()) with open(save_path, "w", encoding="utf-8") as f: f.write(content) return { "success": True, "markdown_path": save_path.expanduser().resolve().absolute().as_posix(), } else: if len(content) > 10000: # Truncate the content to avoid too long response content = content[:10000] + "\n\n... (truncated)" tips = ( "The content is too long. Please use `save_path` to save the markdown file and read it partially." ) else: tips = "All content is returned. " return { "success": True, "markdown_content": content, "tips": tips, } except Exception as e: return { "error": f"Failed to convert PDF to markdown: {e!s}", "success": False, }
- pymupdf4llm_mcp/app.py:11-19 (registration)The @mcp.tool decorator registers the convert_pdf_to_markdown function as an MCP tool, providing a detailed description of its usage and parameters.@mcp.tool( description=( "Converts a PDF file to markdown format via pymupdf4llm. " "This is the best tool to use for reading PDF file. You should always use this tool first. " "The `file_path`, `image_path`, and `save_path` parameters should be the absolute path to the PDF file, not a relative path. " "This tool will also convert the PDF to images and save them in the `image_path` directory. " "For larger PDF files, use `save_path` to save the markdown file then read it partially. " ) )
- pymupdf4llm_mcp/app.py:21-36 (schema)Pydantic Field annotations define the input schema for the tool parameters: file_path (required str), image_path (optional str), save_path (optional str).file_path: Annotated[str, Field(description="Absolute path to the PDF file to convert")], image_path: Annotated[ str | None, Field( description="Optional. Absolute path to the directory to save the images. " "If not provided, the images will be saved in the same directory as the PDF file." ), ] = None, save_path: Annotated[ str | None, Field( description="Optional. Absolute path to the directory to save the markdown file. " "If provided, will return the path to the markdown file. " "If not provided, will return the markdown string." ), ] = None,
- pymupdf4llm_mcp/app.py:8-8 (registration)Initialization of the FastMCP server instance where tools are registered.mcp = FastMCP("pymupdf4llm-mcp")