1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@MMWRAG search for Riemann integral in Lebl's Basic Analysis I" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

MMWRAG

by mikrominiw

Overview Schema Related Servers Score Discussions

Python

Local

MMWRAG

License: MIT Python

A bilingual (Russian/English) RAG over scientific literature (textbooks/papers): vision PDF parsing, BGE-M3 hybrid (dense+sparse) retrieval with a cross-encoder reranker, exposed as an MCP tool (search only — the consumer composes the answer). Every retrieval decision here is measurement-driven — see DECISIONS.md.

Features

Vision PDF parsing behind a swappable interface (cloud PaddleOCR-VL / local PP-StructureV3) — required because the text layer doesn't encode formula structure.
Structure-aware chunking (~512-token packing over blocks, page spans kept for citations).
BGE-M3 dense + sparse embeddings; Qdrant hybrid search with server-side RRF.
Cross-encoder reranker (bge-reranker-v2-m3) over the top-N pool.
Book-aware cross-lingual routing — search(book_id=...) targets a specific book/language.
MCP server (search, list_books) over streamable HTTP — no answer generation.
Eval harness — page-level hit@k / MRR / recall@k, cross-book and cross-lingual.

Related MCP server: ragi

Architecture

INDEXING   PDF ─parse─> Page[] ─chunk─> Chunk[] ─BGE-M3 (dense+sparse)─> Qdrant
QUERY      question ─HybridRetriever (RRF)─> top-N ─cross-encoder rerank─> top-k Source[]
MCP        client ─/mcp─> search(query, top_k, book_id) ─> fragments {book_id, pages, text, score}
                          list_books() ─> indexed books + language

Details in ARCHITECTURE.md.

Quickstart

# 1. dependencies (paddlepaddle-gpu is a manual prereq for the PARSING path only)
uv sync

# 2. vector database
docker compose up -d            # Qdrant on :6333

# 3. bring your own PDF and index it
#    parsing needs PADDLEOCR_TOKEN in .env (see .env.example);
#    pipeline: parse(pdf) -> chunk_pages(...) -> index_chunks(...)  (see notebooks/ for examples)

# 4. run the MCP server
uv run python -m src.mcp.server # streamable-http on 127.0.0.1:8000

The corpus is not included (copyright). Search/MCP need Qdrant + the local models (BGE-M3, the reranker); CPU works (slower), GPU is faster. Parsing additionally needs a PaddleOCR-VL cloud token.

Demo

A real session against the MCP server (notebooks/mcp_smoke.py, output trimmed to metadata):

tools: ['search', 'list_books']

list_books:
  {'book_id': 'zorich_v1', 'title': 'Zorich — Mathematical Analysis I', 'language': 'ru', 'chunks': 1472}
  {'book_id': 'zorich_v2', 'title': 'Zorich — Mathematical Analysis II', 'language': 'ru', 'chunks': 2526}
  {'book_id': 'lebl', 'title': 'Lebl — Basic Analysis I', 'language': 'en', 'chunks': 722}

search RU (all books), top 3:
  zorich_v1 159 2.125
  zorich_v1 158–159 0.297
  zorich_v2 517 -0.357

search RU routed to lebl (cross-lingual), top 3:
  lebl 135–136 0.123
  lebl 167 -0.047
  lebl 208 -0.141

The last call shows book-aware cross-lingual routing: a Russian query with book_id="lebl" returns the English source (Lebl, p.135–136) that a plain cross-book search buries behind the Russian equivalent (see DECISIONS.md §5).

Project structure

src/
  parse/   vision PDF -> Page[]   (cloud / local engines, idempotent cache)
  chunk/   Page[] -> Chunk[]      (structure-aware packing, page spans)
  index/   Chunk[] -> BGE-M3 -> Qdrant   (Embedder / VectorStore interfaces)
  query/   HybridRetriever + RerankingRetriever; answer() with citations
  mcp/     MCP server: search / list_books (pure core + thin FastMCP server)
  eval/    page-level hit@k / MRR / recall@k; cross-book & cross-lingual
tests/     unit tests (pure logic on fakes; integration tests skip offline)
notebooks/ runnable examples & measurement runners (mcp_smoke, eval_*, diag_*)

Status & roadmap

Pipeline (parse → chunk → index → query) and a measured retrieval stack (hybrid + reranker) are done; the MCP search server is done. Next: a network model-serving backend, client ingestion, and an agent layer over MCP. The reasoning and numbers behind each choice are in DECISIONS.md.

License

This server cannot be installed

license - permissive license

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mikrominiw/scientific-rag-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server