PDF Reader MCP Server

README.md•6.26 KiB

# pdf-reader MCP server An MCP server for reading PDFs ## Components ### Resources The server provides academic-aware PDF resources with: - Custom `file://` URI scheme for accessing individual PDFs - Academic structure detection and key section extraction - Metadata enriched with document type classification - Resources optimized for agent understanding ### Academic Prompts The server provides specialized academic analysis prompts: - **summarize-academic-paper**: Intelligent academic paper summarization - Required "file_path" argument for PDF location - Optional "focus" argument (general/methodology/results/implications) - Generates prompts with key sections, citations, and metadata - **analyze-research-methodology**: Deep methodology analysis - Required "file_path" argument for PDF location - Focuses on research design, data collection, and statistical methods ### Enhanced Tools **Basic PDF Processing:** - **load-pdf**: Load and cache a PDF file for processing - **get-metadata**: Get PDF metadata and document information - **extract-images**: Extract embedded images with metadata - **render-page**: Render PDF pages as high-resolution images **Academic Enhancements:** - **extract-academic-text**: Text extraction with proper reading order and math formula preservation - **detect-sections**: Identify academic sections (Abstract, Introduction, Methods, Results, etc.) - **extract-abstract**: Specifically extract the abstract section - **extract-key-sections**: Get key sections optimized for agent understanding - **extract-citations**: Parse in-text citations and reference lists - **chunk-content**: Break content into agent-friendly semantic chunks - **analyze-document-structure**: Comprehensive academic document analysis ## Configuration This PDF reader MCP server provides comprehensive PDF processing capabilities including: - Full text extraction from any PDF - High-resolution image extraction - Table detection and extraction - Annotation and comment extraction - PDF metadata retrieval - Page rendering to images - Document structure analysis ## Installation & Setup ### Prerequisites - Python 3.13 or higher - `uv` package manager (install with `pip install uv`) ### Install Dependencies ```bash uv sync ``` ### IDE Integration #### VSCode with MCP Extension 1. Install the [MCP VSCode Extension](https://marketplace.visualstudio.com/items?itemName=modelcontextprotocol.mcp) 2. Open your VSCode settings (`.vscode/settings.json`) and add: ```json { "mcp.servers": { "pdf-reader": { "url": "http://localhost:8000/sse", "description": "PDF reader with full extraction capabilities" } } } ``` #### WindSurf IDE 1. Open WindSurf settings 2. Navigate to Extensions → MCP Servers 3. Add a new server configuration: ```json { "name": "pdf-reader", "url": "http://localhost:8000/sse", "description": "Comprehensive PDF processing server" } ``` #### Cursor IDE 1. Open Cursor settings (Cmd/Ctrl + ,) 2. Search for "MCP" or navigate to Extensions → MCP 3. Add server configuration: ```json { "mcpServers": { "pdf-reader": { "url": "http://localhost:8000/sse", "description": "PDF reader with text, image, and table extraction" } } } ``` #### Claude Desktop On MacOS: `~/Library/Application\ Support/Claude/claude_desktop_config.json` On Windows: `%APPDATA%/Claude/claude_desktop_config.json` ```json { "mcpServers": { "pdf-reader": { "url": "http://localhost:8000/sse", "description": "PDF reader with comprehensive extraction capabilities" } } } ``` ### Starting the Server Before using the PDF reader in any IDE, start the HTTP server: ```bash # Navigate to the pdf-reader directory cd /path/to/your/pdf-reader # Start the server uv run pdf-reader ``` The server will start on `http://localhost:8000` with the MCP SSE endpoint available at `/sse` for all IDEs to connect to. ## Usage Examples Once configured in your IDE, you can use the PDF reader with natural language commands: ### Basic PDF Processing ``` "Load the research paper at /path/to/paper.pdf and extract all the text" "Get metadata for the PDF document at /documents/report.pdf" "Extract all images from the PDF on page 3" ``` ### Advanced Analysis ``` "Summarize the PDF document in technical style focusing on methodology" "Analyze the structure of this PDF and tell me about its organization" "Extract all tables from the document and show me the data" ``` ### Visual Processing ``` "Render page 5 of the PDF as a high-resolution image" "Extract all annotations and comments from this PDF" "Show me all the images embedded in this document" ``` ### Available Tools | Tool | Description | Parameters | |------|-------------|------------| | `load-pdf` | Load and cache PDF | `file_path`, optional `name` | | `extract-text` | Extract text content | `file_path`, optional `page` | | `extract-images` | Extract embedded images | `file_path`, optional `page` | | `get-metadata` | Get document metadata | `file_path` | | `extract-tables` | Extract table data | `file_path`, optional `page` | | `extract-annotations` | Extract comments/highlights | `file_path` | | `render-page` | Render page as image | `file_path`, `page`, optional `dpi` | ## Development ### Building and Publishing To prepare the package for distribution: 1. Sync dependencies and update lockfile: ```bash uv sync ``` 2. Build package distributions: ```bash uv build ``` This will create source and wheel distributions in the `dist/` directory. 3. Publish to PyPI: ```bash uv publish ``` Note: You'll need to set PyPI credentials via environment variables or command flags: - Token: `--token` or `UV_PUBLISH_TOKEN` - Or username/password: `--username`/`UV_PUBLISH_USERNAME` and `--password`/`UV_PUBLISH_PASSWORD` ### Debugging Since MCP servers run over stdio, debugging can be challenging. For the best debugging experience, we strongly recommend using the [MCP Inspector](https://github.com/modelcontextprotocol/inspector). You can launch the MCP Inspector via [`npm`](https://docs.npmjs.com/downloading-and-installing-node-js-and-npm) with this command: ```bash npx @modelcontextprotocol/inspector uv --directory /Users/cloudchase/Desktop/AverageJoesLab/mcp-servers/pdf-reader run pdf-reader ``` Upon launching, the Inspector will display a URL that you can access in your browser to begin debugging.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/averagejoeslab/pdf-reader-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

README.md•6.26 KiB