Skip to main content
Glama

MMWRAG

CI License: MIT Python

A bilingual (Russian/English) RAG over scientific literature (textbooks/papers): vision PDF parsing, BGE-M3 hybrid (dense+sparse) retrieval with a cross-encoder reranker, exposed as an MCP tool (search only — the consumer composes the answer). Every retrieval decision here is measurement-driven — see DECISIONS.md.

Features

  • Vision PDF parsing behind a swappable interface (cloud PaddleOCR-VL / local PP-StructureV3) — required because the text layer doesn't encode formula structure.

  • Structure-aware chunking (~512-token packing over blocks, page spans kept for citations).

  • BGE-M3 dense + sparse embeddings; Qdrant hybrid search with server-side RRF.

  • Cross-encoder reranker (bge-reranker-v2-m3) over the top-N pool.

  • Book-aware cross-lingual routingsearch(book_id=...) targets a specific book/language.

  • MCP server (search, list_books) over streamable HTTP — no answer generation.

  • Eval harness — page-level hit@k / MRR / recall@k, cross-book and cross-lingual.

Architecture

INDEXING   PDF ─parse─> Page[] ─chunk─> Chunk[] ─BGE-M3 (dense+sparse)─> Qdrant
QUERY      question ─HybridRetriever (RRF)─> top-N ─cross-encoder rerank─> top-k Source[]
MCP        client ─/mcp─> search(query, top_k, book_id) ─> fragments {book_id, pages, text, score}
                          list_books() ─> indexed books + language

Details in ARCHITECTURE.md.

Quickstart

# 1. dependencies (paddlepaddle-gpu is a manual prereq for the PARSING path only)
uv sync

# 2. vector database
docker compose up -d            # Qdrant on :6333

# 3. bring your own PDF and index it
#    parsing needs PADDLEOCR_TOKEN in .env (see .env.example);
#    pipeline: parse(pdf) -> chunk_pages(...) -> index_chunks(...)  (see notebooks/ for examples)

# 4. run the MCP server
uv run python -m src.mcp.server # streamable-http on 127.0.0.1:8000

The corpus is not included (copyright). Search/MCP need Qdrant + the local models (BGE-M3, the reranker); CPU works (slower), GPU is faster. Parsing additionally needs a PaddleOCR-VL cloud token.

Demo

A real session against the MCP server (notebooks/mcp_smoke.py, output trimmed to metadata):

tools: ['search', 'list_books']

list_books:
  {'book_id': 'zorich_v1', 'title': 'Zorich — Mathematical Analysis I', 'language': 'ru', 'chunks': 1472}
  {'book_id': 'zorich_v2', 'title': 'Zorich — Mathematical Analysis II', 'language': 'ru', 'chunks': 2526}
  {'book_id': 'lebl', 'title': 'Lebl — Basic Analysis I', 'language': 'en', 'chunks': 722}

search RU (all books), top 3:
  zorich_v1 159 2.125
  zorich_v1 158–159 0.297
  zorich_v2 517 -0.357

search RU routed to lebl (cross-lingual), top 3:
  lebl 135–136 0.123
  lebl 167 -0.047
  lebl 208 -0.141

The last call shows book-aware cross-lingual routing: a Russian query with book_id="lebl" returns the English source (Lebl, p.135–136) that a plain cross-book search buries behind the Russian equivalent (see DECISIONS.md §5).

Project structure

src/
  parse/   vision PDF -> Page[]   (cloud / local engines, idempotent cache)
  chunk/   Page[] -> Chunk[]      (structure-aware packing, page spans)
  index/   Chunk[] -> BGE-M3 -> Qdrant   (Embedder / VectorStore interfaces)
  query/   HybridRetriever + RerankingRetriever; answer() with citations
  mcp/     MCP server: search / list_books (pure core + thin FastMCP server)
  eval/    page-level hit@k / MRR / recall@k; cross-book & cross-lingual
tests/     unit tests (pure logic on fakes; integration tests skip offline)
notebooks/ runnable examples & measurement runners (mcp_smoke, eval_*, diag_*)

Status & roadmap

Pipeline (parse → chunk → index → query) and a measured retrieval stack (hybrid + reranker) are done; the MCP search server is done. Next: a network model-serving backend, client ingestion, and an agent layer over MCP. The reasoning and numbers behind each choice are in DECISIONS.md.

License

MIT © 2026 mikrominiw

A
license - permissive license
-
quality - not tested
C
maintenance

Maintenance

Maintainers
Response time
Release cycle
Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mikrominiw/scientific-rag-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server