Skip to main content
Glama

convert_pdf_to_markdown

Convert PDF files to markdown format for LLM processing, extracting text and saving images from documents.

Instructions

Converts a PDF file to markdown format via pymupdf4llm. See pymupdf.readthedocs.io/en/latest/pymupdf4llm for more. The file_path, image_path, and save_path parameters should be the absolute path to the PDF file, not a relative path. This tool will also convert the PDF to images and save them in the image_path directory. For larger PDF files, use save_path to save the markdown file then read it partially.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
file_pathYesAbsolute path to the PDF file to convert
image_pathNoOptional. Absolute path to the directory to save the images. If not provided, the images will be saved in the same directory as the PDF file.
save_pathNoOptional. Absolute path to the directory to save the markdown file. If provided, will return the path to the markdown file. If not provided, will return the markdown string.

Implementation Reference

  • The handler function that executes the convert_pdf_to_markdown tool. It validates paths, converts PDF to markdown using pymupdf4llm.to_markdown, handles image saving, optional markdown file saving, truncation for long content, and error handling.
    def convert_pdf_to_markdown(
        file_path: Annotated[str, Field(description="Absolute path to the PDF file to convert")],
        image_path: Annotated[
            str | None,
            Field(
                description="Optional. Absolute path to the directory to save the images. "
                "If not provided, the images will be saved in the same directory as the PDF file."
            ),
        ] = None,
        save_path: Annotated[
            str | None,
            Field(
                description="Optional. Absolute path to the directory to save the markdown file. "
                "If provided, will return the path to the markdown file. "
                "If not provided, will return the markdown string."
            ),
        ] = None,
    ) -> dict[str, Any]:
        file_path: Path = Path(file_path).expanduser().resolve()
        if not file_path.exists():
            return {
                "error": f"File not found: {file_path}",
                "success": False,
            }
        image_path = Path(image_path).expanduser().resolve() if image_path else file_path.parent
        try:
            content = pymupdf4llm.to_markdown(file_path, write_images=True, image_path=image_path.as_posix())
            if save_path:
                save_path: Path = Path(save_path).expanduser().resolve()
                save_path.parent.mkdir(parents=True, exist_ok=True)
                content = pymupdf4llm.to_markdown(file_path, write_images=True, image_path=image_path.as_posix())
                with open(save_path, "w", encoding="utf-8") as f:
                    f.write(content)
                return {
                    "success": True,
                    "markdown_path": save_path.expanduser().resolve().absolute().as_posix(),
                }
            else:
                if len(content) > 10000:
                    # Truncate the content to avoid too long response
                    content = content[:10000] + "\n\n... (truncated)"
                    tips = (
                        "The content is too long. Please use `save_path` to save the markdown file and read it partially."
                    )
                else:
                    tips = "All content is returned. "
    
                return {
                    "success": True,
                    "markdown_content": content,
                    "tips": tips,
                }
        except Exception as e:
            return {
                "error": f"Failed to convert PDF to markdown: {e!s}",
                "success": False,
            }
  • The @mcp.tool decorator registers the convert_pdf_to_markdown function as an MCP tool, providing a detailed description of its usage and parameters.
    @mcp.tool(
        description=(
            "Converts a PDF file to markdown format via pymupdf4llm. "
            "This is the best tool to use for reading PDF file. You should always use this tool first. "
            "The `file_path`, `image_path`, and `save_path` parameters should be the absolute path to the PDF file, not a relative path. "
            "This tool will also convert the PDF to images and save them in the `image_path` directory. "
            "For larger PDF files, use `save_path` to save the markdown file then read it partially. "
        )
    )
  • Pydantic Field annotations define the input schema for the tool parameters: file_path (required str), image_path (optional str), save_path (optional str).
    file_path: Annotated[str, Field(description="Absolute path to the PDF file to convert")],
    image_path: Annotated[
        str | None,
        Field(
            description="Optional. Absolute path to the directory to save the images. "
            "If not provided, the images will be saved in the same directory as the PDF file."
        ),
    ] = None,
    save_path: Annotated[
        str | None,
        Field(
            description="Optional. Absolute path to the directory to save the markdown file. "
            "If provided, will return the path to the markdown file. "
            "If not provided, will return the markdown string."
        ),
    ] = None,
  • Initialization of the FastMCP server instance where tools are registered.
    mcp = FastMCP("pymupdf4llm-mcp")
Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ai-zerolab/pymupdf4llm-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server