Skip to main content
Glama
dakshp26

PDFDashboardWithMCP

by dakshp26

PDF Dashboard With MCP

Upload PDFs, extract text with PyMuPDF or GLM-OCR (Ollama), and ask questions against the document with a local Ollama model. No API keys.

Features

  • PDF extraction: PyMuPDF for text layers; GLM-OCR when the PDF is scanned or image-only

  • Per-document RAG: each upload gets its own Chroma collection

  • Local chat: LangChain agent with inline citations; choose any installed Ollama model from the dropdown

  • Markdown viewer: read extracted text, preview chunks, download markdown

Prerequisites

Setup

1. Clone the repository

git clone https://github.com/dakshp26/PDFDashboardWithMCP.git
cd PDFDashboardWithMCP

2. Install dependencies

uv sync

3. Pull Ollama models

ollama pull qwen2.5:3b       # chat (or another chat model)
ollama pull nomic-embed-text # embeddings
ollama pull glm-ocr          # OCR for scanned PDFs

4. Run the app

uv run streamlit run app/main.py

Open http://localhost:8501 in your browser.

Usage

  1. Upload PDF: open Upload PDF, select a file, wait for extraction to finish

  2. Chat: open Chat, pick the PDF and an Ollama model, ask questions

Project Structure

app/
├── main.py                       # Entry point, page navigation
├── app_pages/
│   ├── landing.py                # Home page
│   ├── process_pdf_upload.py     # Upload + pipeline UI
│   ├── pdf_library.py            # Browse uploaded PDFs (read-only viewer)
│   └── process_pdf.py            # Viewer + chat UI
└── process_pdf/
    ├── extract.py                 # PDF → Markdown (pymupdf4llm + GLM-OCR)
    ├── pipeline.py                # Extraction pipeline with live progress
    ├── rag.py                     # Chunking, embeddings, Chroma persistence
    └── agent.py                   # LangChain agent with retriever tool
mcp_server/
└── server.py                     # MCP server (list_documents, get_document)
data/                             # Runtime data (gitignored)
├── process_pdf/                  # Saved PDFs and extracted markdown
└── process_chroma/               # Chroma vector collections (one per PDF)
NOTE

File-by-file breakdown, execution order, and data flow:APP_STRUCTURE.md.

Pages

Page

What it does

Home

Links and setup summary

Upload PDF

Run extraction (text layer, OCR fallback, chunking, embedding); download markdown

PDF Library

Open past uploads; view markdown and chunk previews without re-running extraction

Chat

Query an indexed PDF with citations

Extraction progress shows in an st.status block. After processing, the Chroma collection lives in data/process_chroma/ and loads on the next run without re-extracting.

MCP Server

Two tools for MCP clients (Claude Desktop, Cursor, Claude Code):

  • list_documents: indexed document collections

  • get_document(document, query): semantic search over a collection

Add to claude_desktop_config.json (Windows: %APPDATA%\Claude\claude_desktop_config.json) or use .mcp.json in the project root:

{
  "mcpServers": {
    "PDFDashboardWithMCP": {
      "command": "uv",
      "args": ["run", "--directory", "/absolute/path/to/PDFDashboardWithMCP", "mcp_server/server.py"]
    }
  }
}

Add to .cursor/mcp.json in the project root or global ~/.cursor/mcp.json:

{
  "mcpServers": {
    "PDFDashboardWithMCP": {
      "command": "uv",
      "args": ["run", "--directory", "/absolute/path/to/PDFDashboardWithMCP", "mcp_server/server.py"]
    }
  }
}

Project-scoped .mcp.json in the repo root keeps the server tied to this repo:

{
  "mcpServers": {
    "PDFDashboardWithMCP": {
      "command": "uv",
      "args": ["run", "--directory", "/absolute/path/to/PDFDashboardWithMCP", "mcp_server/server.py"]
    }
  }
}

Claude Code reads .mcp.json when you open the project.

Replace /absolute/path/to/PDFDashboardWithMCP with your clone path.

Ollama must be running with nomic-embed-text pulled before the MCP server can load collections.

Tech Stack

Component

Library

UI

Streamlit

PDF extraction

langchain-pymupdf4llm, PyMuPDF

OCR fallback

Ollama glm-ocr

Embeddings

Ollama nomic-embed-text

Vector store

Chroma (langchain-chroma)

LLM / agent

Ollama chat model (e.g. qwen2.5:3b), LangChain

Package manager

uv

MCP server

mcp[cli]

A
license - permissive license
-
quality - not tested
C
maintenance

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/dakshp26/PDFDashboardWithMCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server