Skip to main content
Glama
thanharmstrong86

MCP PDF to Markdown Converter

MCP PDF to Markdown Converter and Crawler πŸ“„βž‘οΈπŸ“

This project provides a robust system for converting PDF documents to Markdown format and crawling web content using a Multi-Server Communication Protocol (MCP) architecture. It comprises two main modules: convert_pdf for PDF upload and conversion, and crawl_mcp for web crawling, along with a client application that orchestrates operations using a reactive agent.

Project Structure

The core components of this project are:

  • convert_pdf: A FastMCP server (running on http://127.0.0.1:8001) responsible for handling PDF file uploads and converting them to Markdown. It includes two endpoints:

    • /upload/mcp/upload_pdf_tool: Handles PDF file uploads via multipart form data.

    • /mcp: Converts uploaded PDFs to Markdown using the convert_pdf_to_markdown_tool.

  • crawl_mcp: A server module for crawling web content. For details on running this module, see src/crawl_mcp/README.md.

  • client: A client application that acts as an intelligent agent. It uses LangChain and LangGraph to interact with the MCP servers, upload PDFs, and trigger conversions or crawling tasks.

Related MCP server: markdown2pdf-mcp

Getting Started πŸš€

Follow these steps to set up and run the project:

1. Prerequisites

  • Python 3.9+

  • uv: A fast Python package installer and resolver. Install it via pip if not already present:

    pip install uv

2. Project Setup

  1. Clone the repository (if applicable) or navigate to your project root.

    cd /path/to/your/MCP
  2. Create and Sync Virtual Environment: uv will create a .venv directory and install all necessary dependencies based on your pyproject.toml.

    uv sync
  3. Activate the Virtual Environment: This ensures all commands run within your isolated environment.

    • macOS/Linux:

      source .venv/bin/activate
    • Windows (Command Prompt):

      .venv\Scripts\activate.bat
    • Windows (PowerShell):

      .venv\Scripts\Activate.ps1
  4. Create .env file: Create a file named .env in the project root (MCP/) and add your Google Gemini API key:

    GEMINI_API_KEY_2="YOUR_GEMINI_API_KEY_HERE"

    Replace "YOUR_GEMINI_API_KEY_HERE" with your actual API key.

3. Running the Modules

Each module has its own setup and running instructions. Refer to the module-specific READMEs for details:

4. Docker

The convert_pdf module can be run using Docker Compose with a single service:

  • Service: mcp-convert-server (port 8001)

  • Functionality: Handles PDF uploads and conversion to Markdown.

To run:

cd src/convert_pdf
docker-compose up --build -d

For crawl_mcp Docker instructions, refer to src/crawl_mcp/README.md.

5. Testing with Client

To test the modules, use the client application located in src/client/. Ensure the relevant servers are running, then execute:

uv run python src/client/*

For example, to test the convert_pdf module, ensure a PDF file (e.g., input/sample.pdf) exists in the project’s input directory and run:

uv run python src/client/test_client.py

For testing crawl_mcp, refer to its README for specific client instructions.

6. Directory Structure

MCP/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ convert_pdf/
β”‚   β”‚   β”œβ”€β”€ README.md
β”‚   β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”‚   β”œβ”€β”€ convert_mcp.py
β”‚   β”‚   β”‚   β”œβ”€β”€ pdf2md.py
β”‚   β”‚   β”‚   └── upload_api.py
β”‚   β”‚   β”œβ”€β”€ uploaded/
β”‚   β”‚   β”œβ”€β”€ output/
β”‚   β”‚   β”œβ”€β”€ processed_files.json
β”‚   β”‚   β”œβ”€β”€ docker-compose.yml
β”‚   β”‚   β”œβ”€β”€ Dockerfile
β”‚   β”‚   β”œβ”€β”€ pyproject.toml
β”‚   β”‚   └── uv.lock
β”‚   β”œβ”€β”€ crawl_mcp/
β”‚   β”‚   β”œβ”€β”€ README.md
β”‚   β”‚   └── (other module files)
β”‚   β”œβ”€β”€ client/
β”‚   β”‚   β”œβ”€β”€ test_client.py
β”‚   β”‚   └── (other client scripts)
β”œβ”€β”€ .env
└── README.md

Notes

  • Ensure the .env file is correctly configured with your API key.

  • The convert_pdf module handles both upload and conversion on port 8001, consolidating functionality for efficiency.

  • For detailed module configurations, refer to the respective READMEs.

  • If encountering issues (e.g., ClientDisconnect or import errors), check logs with:

    docker-compose logs mcp-convert-server
-
security - not tested
F
license - not found
-
quality - not tested

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/thanharmstrong86/mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server