MCP-based Academic Paper Retrieval RAG System
Enables GitHub Copilot to query a private knowledge base of academic papers via standard Tool Calling, providing inline citations.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@MCP-based Academic Paper Retrieval RAG Systemwhat are the latest papers on LLM fine-tuning?"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
MCP-based Academic Paper Retrieval RAG System
A production-oriented, fully pluggable Retrieval-Augmented Generation (RAG) system exposed as an MCP (Model Context Protocol) server — enabling AI coding assistants like GitHub Copilot and Claude Desktop to query private knowledge bases via standard Tool Calling.
Overview
Engineers on R&D teams frequently search academic literature for technical research, but papers are scattered across internal file systems and keyword search fails to understand semantic intent — or integrate with AI coding assistant workflows.
This project solves that by building an MCP-based academic paper retrieval RAG system: ingest your PDFs once, then ask questions directly from GitHub Copilot or Claude Desktop via standard Tool Calling. Responses include inline citations and are grounded in your private document collection. The system supports both API-based LLM backends and fully local Ollama deployment. When configured with local embedding and local LLM backends, it can run without external network dependency, making it suitable for privacy-sensitive or air-gapped environments.
Benchmark results (10 papers, 1,104 chunks, 100 Golden QA pairs):
Metric | Hybrid Search (baseline) | + BGE Reranker v2-m3 |
Hit Rate@10 | 89% | 89% |
MRR | 0.61 | 0.83 (+36%) |
NDCG@10 | 0.68 | 0.84 (+24%) |
Ablation: replacing BGE Reranker v2-m3 with a general-purpose MS MARCO Cross-Encoder degraded MRR on academic text — domain-aligned reranking matters.
Architecture
PDF Documents
↓
┌─────────────────────────────────────┐
│ Ingestion Pipeline │
│ Load → Split → Transform → Embed │
│ → Upsert │
└──────────────────┬──────────────────┘
│
┌─────────┴──────────┐
│ │
ChromaDB (Dense) BM25 Index (Sparse)
BGE-M3 vectors Term frequencies
│ │
└─────────┬──────────┘
│ RRF Fusion
↓
BGE Reranker v2-m3
↓
┌─────────────────────────────────────┐
│ MCP Server │
│ JSON-RPC 2.0 + Stdio Transport │
│ │
│ • query_knowledge_hub │
│ • list_collections │
│ • get_document_summary │
└─────────────────────────────────────┘
↓ ↓
GitHub Copilot Claude DesktopKey Features
Two-Stage Hybrid Retrieval
Coarse ranking: BGE-M3 dense vectors + BM25 sparse retrieval run in parallel, fused via Reciprocal Rank Fusion (RRF)
Fine ranking: BGE Reranker v2-m3 (same model family as the embedder) re-scores the top candidates — domain-aligned reranking for academic text
Graceful fallback: reranker failure automatically falls back to fusion order with
fallback=Truemetadata
MCP-Compliant RAG Server
Full JSON-RPC 2.0 protocol over stdio transport
Three tools available to any MCP client:
query_knowledge_hub— semantic search with inline citationslist_collections— enumerate available knowledge basesget_document_summary— retrieve title, summary, tags for a document
Structured multi-modal responses with citation blocks
Intelligent Ingestion Pipeline
5 stages: Load → Split → Transform → Embed → Upsert
pdfplumber PDF parsing with image extraction and
[IMAGE: id]placeholder injectionOptional LLM-driven chunk refinement and metadata enrichment (Title / Summary / Tags)
SHA-256 content hashing for idempotent incremental ingestion — re-running on unchanged files is a no-op
Fully Pluggable Architecture
Every core component is swappable via config/settings.yaml with zero code changes:
Component | Supported Backends |
LLM | OpenAI / Azure / DeepSeek / Ollama |
Embedding | OpenAI / Azure / Ollama (BGE-M3, nomic-embed-text, …) |
Vector Store | ChromaDB (local persistence) |
Reranker | BGE / Cross-Encoder / LLM / None |
Splitter | Fixed / Recursive / Semantic (TODO) |
Evaluator | Custom (Hit Rate / MRR / NDCG) / Ragas |
When configured with Ollama-based local LLM and embedding backends, the system can run fully locally with no data leaving the environment.
Observability Dashboard
6-page Streamlit management platform:
Page | Function |
System Overview | Live component configuration view |
Data Browser | Browse documents and chunks with metadata |
Ingestion Manager | Upload PDFs, monitor real-time progress, delete documents |
Ingestion Traces | Per-run stage timing (load / split / transform / embed / upsert) |
Query Traces | Per-query stage breakdown (dense / sparse / fusion / rerank) |
Evaluation | Run Golden QA evaluation and view Hit Rate / MRR / NDCG results |
Evaluation Framework
Custom evaluator: Hit Rate@K, MRR, NDCG@10
Ragas integration (Faithfulness / Answer Relevancy / Context Precision) — optional, requires API key
Golden Test Set regression pipeline — every retrieval strategy change produces quantified before/after metrics
Quick Start
Prerequisites
Python 3.10+
Ollama (for local embedding) or an OpenAI-compatible API key
git clone <repo-url>
cd Modular-RAG-MCP-Server
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -e ".[dev]"Configure
Edit config/settings.yaml to set your embedding provider and model:
embedding:
provider: ollama # or openai / azure
model: bge-m3
rerank:
backend: cross_encoder # or none / llm
model: BAAI/bge-reranker-v2-m3Set API keys if using cloud providers:
export OPENAI_API_KEY="sk-..."Ingest Documents
python scripts/ingest.py --path /path/to/papers/ --collection researchQuery
python scripts/query.py --query "How does RRF fusion work?" --verboseLaunch Dashboard
streamlit run src/observability/dashboard/app.py
# Open http://localhost:8501Run Evaluation
python scripts/evaluate.py --collection research --output results/eval.jsonMCP Integration
GitHub Copilot (VS Code)
Create .vscode/mcp.json:
{
"servers": {
"modular-rag": {
"type": "stdio",
"command": "python",
"args": ["src/mcp_server/server.py", "--config", "config/settings.yaml"],
"env": {
"OPENAI_API_KEY": "${env:OPENAI_API_KEY}"
}
}
}
}Claude Desktop
Add to claude_desktop_config.json:
{
"mcpServers": {
"modular-rag": {
"command": "python",
"args": ["/absolute/path/to/src/mcp_server/server.py"],
"env": {
"OPENAI_API_KEY": "your-api-key"
}
}
}
}Running Tests
pytest -q # full suite (649 tests)
pytest -q tests/unit/ # unit tests only
pytest -q tests/integration/ # integration tests
pytest -q tests/e2e/ # E2E: MCP client, dashboard smoke, recall regressionTech Stack
Retrieval: BGE-M3 · BM25 · RRF · BGE Reranker v2-m3
Storage: ChromaDB · SQLite · JSON Lines
Protocol: MCP / JSON-RPC 2.0 / Stdio Transport
LLM Backends: OpenAI · Azure OpenAI · Ollama · DeepSeek
Evaluation: Custom (Hit Rate / MRR / NDCG) · Ragas
Dashboard: Streamlit
Testing: Pytest · 638 tests across Unit / Integration / E2E
Design Patterns: Abstract Base Class · Factory Pattern · Dependency Injection · TDD
This server cannot be installed
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/Ta-Gu/MCP-based-Academic-Paper-Retrieval-RAG-System'
If you have feedback or need assistance with the MCP directory API, please join our Discord server