Skip to main content
Glama
jacobhgruber-dev

Academic Research MCP

Academic Research MCP

Unified scholarly search across 60+ academic sources, with a 13,000-document Catholic theological corpus (12 popes, 94 patristic/magisterial sources, 434K cross-references), CMOS 18/SBL 2nd/APA 7th/Turabian 9th citation formatting, 19 specialized source types, CSL-JSON/BibTeX export, and original language tools. Includes 150+ MCP tools, hybrid BM25+vector search, AI-powered research via Elicit, and collaborative rich-text editing (Tiptap + Yjs).

Built on paper-search-mcp by openags (MIT licensed), extended with institutional database connectors, Firecrawl web research, citation graph traversal, and dual-style citation formatting with mode-separated CMOS 18 and SBL 2 output.


Vision

A single interface for all academic research. Instead of bouncing between JSTOR, ProQuest, EBSCOhost, Google Scholar, and a dozen other sites — each with its own login — you describe what you're looking for and the MCP searches everything at once, deduplicates results, downloads accessible PDFs, and extracts full text for the AI to analyze.

Phase 1 (now): MCP server — search, download, read across all sources from within opencode or any MCP client. Phase 2: Web dashboard — browse saved papers, manage credentials, generate and export research reports. Phase 3: Hosted server — access from anywhere, retained institutional logins, shared research sessions.

The MCP server is the engine. Everything else is a client that talks to it.


Related MCP server: Academic MCP Server

Architecture

┌─────────────────────────────────────────────────────────────┐
│                     MCP CLIENTS                              │
│   opencode  ·  Claude Desktop  ·  Cursor  ·  VS Code  ·  Web │
└────────────────────────┬────────────────────────────────────┘
                         │  JSON-RPC (stdio / Streamable HTTP)
┌────────────────────────▼────────────────────────────────────┐
│                  ACADEMIC RESEARCH MCP                        │
│                                                              │
│  ┌──────────────────┐  ┌──────────────────────────────────┐ │
│  │  paper_search_mcp │  │  institutional/                  │ │
│  │  (23 free/open    │  │  (5 institutional adapters)       │ │
│  │   source adapters)│  │                                  │ │
│  │                   │  │  jstor.py      proquest.py       │ │
│  │  arxiv.py         │  │  ebscohost.py  project_muse.py   │ │
│  │  pubmed.py        │  │  web_of_science.py               │ │
│  │  semantic.py      │  └──────────────────────────────────┘ │
│  │  citation_graph.py│                                       │
│  │  crossref.py      │                                       │
│  │  openalex.py      │  ┌──────────────────────────────────┐ │
│  │  citation_formatter│  │  Firecrawl web_research (optional)│ │
│  │  specialized_format│  └──────────────────────────────────┘ │
│  │  style_reference.py│                                       │
│  │  ... (16 more)    │                                       │
│  └──────────────────┘                                       │
│                                                              │
│  ┌────────────────────────────────────────────────────────┐ │
│  │  Citation & Style Layer                                  │ │
│  │                                                         │ │
│  │  citation_formatter.py — Books, journals, chapters,     │ │
│  │       edited/translated books (5 formats per type)      │ │
│  │  specialized_formatter.py — Bible, Church Fathers, DSS, │ │
│  │       Qur'an, Rabbinic, Vatican (23 doc types), ANE,    │ │
│  │       Papyri, Loeb, Josephus/Philo, Pseudepigrapha,     │ │
│  │       Apostolic Fathers, Nag Hammadi, Commentary,       │ │
│  │       Dictionary, Liturgical, Aquinas, Website/Blog     │ │
│  │  style_reference.py — Queryable CMOS 18 / SBL 2        │ │
│  │       knowledge base (45+ topics, 15 difference pairs)  │ │
│  └────────────────────────────────────────────────────────┘ │
│                                                              │
  │  Two-layer design:                                           │
  │  Layer 1: search_papers() — concurrent multi-source search   │
  │           with deduplication (DOI → title → paper_id)        │
  │  Layer 2: Per-source search/download/read tools              │
  │           download_with_fallback() — OA-first fallback chain │
  │  Layer 3: Citation formatting — CMOS 18 & SBL 2nd ed.       │
  │           Notes, bibliographies, author-date, specialized    │
  │                                                              │
  │  Search Indexes (local):                                     │
  │  Tantivy (BM25) + LanceDB (vector) hybrid search             │
  │  Surya OCR → PyTesseract fallback for scanned PDFs           │
  │  Grobid structured metadata extraction from PDFs             │
└─────────────────────────────────────────────────────────────┘

Every source adapter implements the same PaperSource interface (search, download_pdf, read_paper). Adding a new source takes one Python file and a few lines of registration — no architectural changes.


Citation & Style

The MCP includes a full citation formatting system supporting two major academic style guides with mode-separated output verified against the published manuals:

Style

Version

Source

CMOS 18

The Chicago Manual of Style, 18th ed. (2024)

Chapters 13–14

SBL 2

The SBL Handbook of Style, 2nd ed. (2014)

Chapters 4, 6, 8; Student Supplement

Each mode produces distinct, correct output. CMOS 18 defers to SBL for biblical/ancient Near Eastern conventions (as the manual itself recommends). SBL 2 defers to CMOS for general formatting (philosophy, psychology, comparative religions). The tool enforces mode-appropriate formatting throughout:

Rule

CMOS 18

SBL 2

Place of publication

Omitted for post-1900 books (14.30)

Required for all books (6.1.4.2)

Publisher names

Full names, strip Inc./Ltd. only (14.32)

Abbreviated per SBLHS 6.1.4.1 (64 publishers)

State abbreviations

Traditional (Mass., N.Y., Calif.)

USPS postal (MA, NY, CA) (8.1.1)

Bible abbreviations

Periods: Gen., Exod. (14.138)

No periods: Gen, Exod (8.2)

Series/journals

Full names

Abbreviated in notes AND bib (6.1.3.5)

3-em dash in bibs

Discontinued (13.72)

Still used (6.2)

BCE/CE format

B.C.E., C.E.

BCE, CE (8.1.2)

Standard source types

Books, journal articles, chapters in edited volumes, edited books, and translated books — in five citation formats: full note, shortened note, bibliography, author-date reference list, and author-date parenthetical citation.

Specialized source types

All 19 specialized source types support three citation formats: full note ("note"), shortened note for second appearance onwards ("note_short"), and bibliography entry ("bibliography"). Primary/ancient sources (Bible, DSS, Josephus/Philo, Pseudepigrapha, Apostolic Fathers, Nag Hammadi, ANE texts, Rabbinic, Qur'an, papyri) return an empty bibliography string — cite the modern edition as a standard book instead. More than 300 exact-output tests verify correctness across all formats and both styles.

Notable formatting details across all formats:

  • En dashes in page ranges (95–96, not 95-96)

  • Commas inside quotation marks per American style ("Title," in *Journal*)

  • CMOS 18 omits place of publication for post-1900 books; SBL 2 requires it

  • Only first author inverted in bibliographies; comma before "and" for two authors

  • Publisher abbreviation per SBL conventions (64 known publishers)

  • State abbreviation per style (USPS postal for SBL 2; traditional for CMOS 18)

Specialized biblical / theological sources

Nineteen dedicated formatters handle primary-source citation patterns not covered by standard book/journal rules. All support three formats: full first note, shortened note (second appearance onwards), and bibliography. Examples below — SBL short-note and bibliography shown where they differ from the first note:

Source type

SBL first note

SBL short note

SBL bibliography

Bible

Gen 1:1 NRSV

Gen 1:1 (drops version)

— (cite edition as book)

Church Fathers

Augustine, *Conf.* 8.12.29 (NPNF1 1:180)

Augustine, *Conf.* 8.12.29

Augustine. *Confessions*. NPNF1 1. Translated by Pilkington. Peabody, MA: Hendrickson, 1994.

Josephus / Philo

Josephus, Ant. 2.233-235 (Thackeray, LCL)

Josephus, Ant. 2.233-235

— (cite Loeb edition as book)

Commentary

Hooker, *Saint Mark*, BNTC 2 (Peabody, MA: Hendrickson, 1991), 223.

Hooker, *Saint Mark*, 223

Hooker, Morna. *The Gospel according to Saint Mark*. BNTC 2. Peabody, MA: Hendrickson, 1991.

Dictionary

Walters, "Jacob Narrative," ABD 3:599-609.

Walters, "Jacob Narrative," 599-609

Walters, Stanley D. "Jacob Narrative." ABD 3:599-609.

Dead Sea Scrolls

1QS (Rule of the Community) 3:13-4:26

1QS 3:13-4:26 (drops descriptive name)

Pseudepigrapha

1 En. 10:1-3

1 En. 10:1-3 (same)

Apostolic Fathers

Ign. Eph. 7.2

Ign. Eph. 7.2 (same)

Nag Hammadi

Gos. Thom. (NHC II, 2; saying 32)

Same

ANE texts

"Enuma Elish," trans. Foster (COS 1.111:391).

Same

Papyri

P.Cair.Zen. 59003

Same

Qur'an

Qurʾan 2:255 (trans. Yusuf Ali)

Same

Rabbinic

m. Ber. 1:1 / b. Ber. 2a

Same

Website / Blog

Goodacre, "Jesus' Wife Fragment," NT Blog, 9 May 2014, ...

Goodacre, "Jesus' Wife Fragment"

Goodacre, Mark. "Jesus' Wife Fragment." *NT Blog*, 9 May 2014. https://...

Loeb

Josephus, Ant. 2.233-235 (Thackeray, LCL)

Josephus, Ant. 2.233-235

Josephus. *Jewish Antiquities*. Translated by H. St. J. Thackeray. LCL. Cambridge: Harvard University Press, 1930.

Vatican (encyclical)

Francis, encyclical *Laudato Si'* (24 May 2015), §139.

*Laudato Si'*, §139

Francis. *Laudato Si'*. 24 May 2015. https://... (CMOS only; SBL omits URL)

Liturgical

*Roman Missal*, 3rd typical ed. (2011), no. 25

GIRM 25

*Roman Missal*. 3rd typical ed. Vatican City: Libreria Editrice Vaticana, 2011.

Aquinas

Thomas Aquinas, ST, I-II q. 94 a. 2

ST I-II, q. 94, a. 2

Aquinas, Thomas. *Summa Theologiae*. Translated by Dominican Fathers. New York: Benziger, 1947.

Latin vs. English conventions: SBL 2 §8.3.6 and §8.3.14.3 state that Latin abbreviations are preferred in notes for classical and patristic works, but English titles may also be used if consistent. The formatter defaults to Latin abbreviations in SBL mode (Ant., Conf., Haer., ST) and English full titles in CMOS mode (Jewish Antiquities, Confessions, Adversus Haereses, Summa Theologiae). Users can override by providing either work_abbrev or work_title in the paper dict. For Philo and Josephus specifically, "no priority is intended by our listing of the Latin titles first; authors should decide which they prefer and remain consistent" (SBL 2 §8.3.6).

Vatican document type support

The Vatican formatter handles 32 document types across the full authority hierarchy, including curial types (instruction, decree, declaration, note, notification, response). It also extracts magisterial_type (ordinary/extraordinary) and infallible fields when present. All documents support first note, shortened note, and bibliography. Authoritative documents (encyclicals, constitutions, motu proprios) produce bibliography entries; oral communications (audiences, homilies) produce note-format references only. SBL 2 mode omits URLs for Vatican documents per general SBL conventions.

Authority level

Types

Example (SBL first note → SBL short)

Solemn papal

apostolic constitution, dogmatic constitution, encyclical

Francis, encyclical *Laudato Si'* (24 May 2015), §139.*Laudato Si'*, §139

Major teaching

apostolic exhortation, apostolic letter, motu proprio

Francis, Motu Proprio *Traditionis Custodes* (16 July 2021), art. 3.*Traditionis Custodes*, art. 3

Conciliar

pastoral constitution, decree, declaration

Second Vatican Council, Dogmatic Constitution *Dei Verbum* (18 November 1965), §12.*Dei Verbum*, §12

Curial

instruction, rescript

Cite by issuing body, title, date, section

Oral / pastoral

general audience, homily, address, speech, allocution, angelus, regina caeli, message

Francis, General Audience "On the Lord's Prayer" (St. Peter's Square, 15 May 2024), 1. → Same

Reference

catechism (by paragraph), canon law (CIC, CCEO, CIC/1917)

CIC c. 204, §1 → same

Canon law bibliography dynamically resolves edition details: CIC (Libreria Editrice Vaticana, 1983), CIC/1917 (P. J. Kenedy & Sons, 1918), CCEO (Libreria Editrice Vaticana, 1990). User-supplied publisher/place/year overrides defaults.

Known-edition defaults

To prevent incomplete bibliography entries, the formatter fills missing publisher/place/year from a known-edition table when the series is identified:

Series

Default publisher

Default place

Default year

ANF

Hendrickson

Peabody, MA

1994

NPNF1 / NPNF2

Hendrickson

Peabody, MA

1994

PG

J.-P. Migne

Paris

1857–1866

PL

J.-P. Migne

Paris

1844–1864

SC (Sources Chrétiennes)

Cerf

Paris

1941–

When critical metadata cannot be filled, a [Note: incomplete edition data] annotation is appended to the bibliography entry rather than silently producing an incomplete citation.

Aquinas / scholastic philosophy

Supports all major works of Thomas Aquinas with SBL abbreviation and CMOS full-title modes:

Work

Abbrev

SBL example

Summa Theologiae

ST

Aquinas, ST, I-II q. 94 a. 2

Summa Contra Gentiles

SCG

Aquinas, SCG, 1.3

De Ente et Essentia

De ente

Aquinas, De ente, ch. 1

Catena Aurea

Cat. aur.

Aquinas, Cat. aur., Luc. ch. 4, §5

In Metaphysicam

In Meta.

Aquinas, In Meta., bk. 1 lect. 1

Compendium Theologiae

Comp. theol.

Aquinas, Comp. theol., ch. 1

Catholic liturgical texts

Eighteen liturgical book names recognized (including Latin variants), with abbreviation support. The Roman Missal supports 13 canonical edition IDs (mr1474–mr2008) with long-form reverse lookup:

  • Roman Missal / Missale Romanum → RM

  • Lectionary for Mass / Ordo Lectionum Missae

  • Liturgy of the Hours / Divine Office / Liturgia Horarum → LH

  • Roman Ritual / Rituale Romanum

  • Roman Pontifical / Pontificale Romanum

  • General Instruction of the Roman Missal → GIRM / IGMR (Institutio Generalis Missalis Romani)

  • Roman Gradual / Graduale Romanum

  • Ceremonial of Bishops / Caeremoniale Episcoporum

Verification system

Every format_citation call can return a diagnostic report with:

  • Errors: missing required fields (e.g., no title)

  • Warnings: gentle hints about optional but useful fields

  • Style notes: decisions the formatter made (e.g., "Publisher abbreviated to Fortress per SBLHS 6.1.4.1")

Style knowledge base

The query_style_rule tool answers natural-language questions like "how do I cite the Mishnah?" or "what's the abbreviation for Dead Sea Scrolls?" by looking up structured rules in style_reference.py. Coverage includes:

CMOS 18 (21 topics): citation systems, notes, bibliographies, authors, titles, place of publication, publisher formatting, page numbers, editions, book/journal/chapter formats, author-date, punctuation, scriptural references, classical references, ancient/specialized sources, publisher abbreviations, state abbreviations, philosophy (ancient, medieval, modern, comparative), psychology, comparative religions, Catholic theology (Vatican documents, catechism, canon law, audiences/homilies)

SBL 2 (24 topics): relationship to CMOS, notes, bibliographies, place of publication, publisher names (64 known), series/journal abbreviations, biblical citations (all OT/NT/Deuterocanonical), commentaries, dictionaries/encyclopedias, electronic sources, transliteration, term paper formatting, BCE/CE rules, small caps, ancient sources (Hebrew Bible, NT, deuterocanonical, pseudepigrapha, DSS, Philo, Josephus, Mishnah/Talmud/Rabbinic, Targumic, Apostolic Fathers, Nag Hammadi, classical Christian, ANE editions, papyri/ostraca, Greek magical papyri, Church Fathers series, Loeb, Strack-Billerbeck, ANRW, SBL Seminar Papers, Bible versions, Migne Patrologia, Vatican documents including 23 document types, Qur'anic/Islamic), abbreviation authorities, electronic sources, commentary series (AYB, WBC, ICC, Hermeneia, NICOT/NICNT, SP, BNTC, etc.), dictionary series (ABD, TDNT, TDOT, NCE, etc.), philosophy, psychology of religion, comparative religions (Hindu, Buddhist, Sikh, Confucian/Daoist, Zoroastrian), Catholic theology (catechism, conciliar, papal, canon law, SC series, CCCS, NCE), Sources Chrétiennes

Cross-style differences (15 entries): place of publication, publisher names, state abbreviations, 3-em dash, series/journal abbreviations, BCE/CE format, Bible version abbreviations, footnote shortening, bibliography page formatting, publisher/place parentheses, philosophy (ancient), Vatican documents, comparative religions sacred texts, psychology citations, commentary series

Every rule keyed to its CMOS 18 or SBL 2 section number.


Capability Matrix

Open-Access Sources (Free — No Credentials Required)

Source

Search

Download

Read

Notes

arXiv

Preprints in physics, math, CS

PubMed

--

--

Biomedical metadata; no direct PDF

PubMed Central (PMC)

✓ (OA)

✓ (OA)

Open-access biomedical full-text

Europe PMC

✓ (OA)

✓ (OA)

OA biomedical full-text

bioRxiv

Biology preprints (last 30 days)

medRxiv

Medical preprints (last 30 days)

Semantic Scholar

✓ (OA)

✓ (OA)

200M+ papers, citation graph

CrossRef

--

--

DOI-centric metadata, 150M+ records

OpenAlex

--

--

250M+ works, free API

IACR ePrint

Cryptography preprints

CORE

✓¹

✓¹

OA repository aggregator

dblp

--

--

Computer science bibliography

OpenAIRE

--

--

European research information

ChemRxiv

✓¹

✓¹

Chemistry preprints via CrossRef

CiteSeerX

✓¹

✓¹

CS digital library

DOAJ

✓¹

✓¹

Directory of OA journals

BASE

✓¹

✓¹

Bielefeld Academic Search Engine

Zenodo

✓¹

✓¹

OA repository

HAL

✓¹

✓¹

French OA archive

SSRN

✓¹

✓¹

Social sciences preprint server

Unpaywall

✓²

--

--

OA status for any DOI

Lens.org

--

--

Scholarly works + patents from CrossRef, PubMed, OpenAIRE. Free public API, optional API token for higher rate limits

¹ Record-dependent (only when the source exposes a direct PDF link)
² DOI lookup only (not keyword search)

Specialized Corpora (Free — No Credentials Required)

Source

Search

Download

Read

Notes

CCEL Church Fathers

38-volume Schaff ANF/NPNF collection (Apostolic Fathers through Seven Ecumenical Councils). Local search index + live CCEL downloads. Scripture cross-reference index (Bible verse → patristic commentary) via lookup_church_fathers_crossref

Perseus Digital Library

--

120 major Greek/Latin classical works (Homer through Boethius). Scaife Viewer integration via CTS URN

Open Library

--

--

Internet Archive book catalog. 30M+ books, free no-key API

OSF Preprints

30+ preprint servers (SocArXiv, PsyArXiv, Thesis Commons, engrXiv, etc.). JSON:API

IxTheo

--

--

Index Theologicus — 3M+ theology records from U. Tübingen. Solr/SRU dual-path

PhilPapers

Premier philosophy bibliography + full-text via PhilArchive. JSON API

Sefaria

--

Comprehensive open-access Jewish texts corpus (Tanakh, Talmud, Midrash, commentaries, Kabbalah, liturgy). Bilingual Hebrew/English text via public REST API. Connection graphs and name autocomplete tools

Summa Theologica

--

English translation (Benziger 1947), 1,761 articles. Local FTS5 full-text index built on first use.

Roman Catechism

--

Council of Trent (1566) catechism, ~2,230 paragraphs. Local FTS5 full-text index built on first use.

Baltimore Catechism

--

No. 2 edition (1885), 421 Q&A pairs. Local FTS5 full-text index built on first use.

GIRM

--

General Instruction of the Roman Missal, 399 paragraphs. Local FTS5 full-text index built on first use.

John Henry Newman

--

58 works, 964 chapters from newmanreader.org (the gold standard Newman source). Full-text search across complete works with CMOS citation metadata (publisher, place, year, impression, page ranges). Letters & Diaries: 14/32 volumes indexed. Local SQLite FTS5 index at ~/.paper_search_mcp/newman_index.db built on first use. Paper IDs: newman:{work}/{chapter}.

Daily Readings

--

--

Daily Mass readings from USCCB via catholic-mass-readings API (no credentials required).

Catholic Ontology

--

--

SPARQL search against 120K-triple Catholic Semantic Canon OWL ontology.

Liturgical Calendar

--

--

Feasts, seasons, colors, readings by year/country via Liturgical Calendar API.

Premium Sources (Free API Key Required)

Source

Search

Download

Read

Notes

Google Books

--

--

Book search. Activate with GOOGLE_BOOKS_API_KEY (free tier: 1K queries/day)

IEEE Xplore

✓³

--

--

Skeleton — activate with IEEE_API_KEY

ACM Digital Library

✓³

--

--

Skeleton — activate with ACM_API_KEY

³ Searches metadata; full-text requires institutional access beyond API key

AI-Powered Research Tools (Free API Key Required)

Source

Search

Read

Report

Notes

Elicit

AI research assistant with semantic search across 138M+ papers. Generate structured literature reviews and systematic review workflows. Activate with ELICIT_API_KEY (free tier available)

Institutional Databases (Proxy/Login Required)

Source

Search

Download

Read

Activation

JSTOR

JSTOR_USERNAME+JSTOR_PASSWORD (Shibboleth auto-login) or JSTOR_PROXY_URL

ProQuest

PROQUEST_PROXY_URL / INSTITUTION_PROXY_URL

EBSCOhost

EBSCOHOST_API_KEY (API) or EBSCOHOST_PROXY_URL (scraping)

Project MUSE

PROJECT_MUSE_PROXY_URL / INSTITUTION_PROXY_URL

Web of Science

--

--

WOS_API_KEY (API) or INSTITUTION_PROXY_URL (scraping). Citation index only — no PDFs.

Institutional adapters auto-activate when their environment variables are set. No code changes needed.

Web Research (Optional)

Source

Search

Scrape

Notes

Firecrawl

Web search + page scraping. Requires FIRECRAWL_API_KEY.

Discovery Fallback

Source

Search

Notes

Google Scholar

Optional. Set GOOGLE_SCHOLAR_PROXY_URL to bypass bot detection. Use as discovery/DOI-recovery fallback, not primary index.

Sci-Hub

--

Optional. User-responsibility; disabled by default.


MCP Tools

Unified (Layer 1)

Tool

Description

search_papers

Concurrent search across all enabled sources with deduplication

download_with_fallback

OA-first multi-step download chain: source-native → repository discovery → Unpaywall → optional Sci-Hub

traverse_citations

Multi-hop citation graph traversal via Semantic Scholar (BFS, deduplication, configurable depth)

Citation & Style (Layer 3)

Standard Sources (books, journals, chapters)

Tool

Description

format_citation

Format paper metadata into CMOS 18 / SBL 2nd ed. citations (note, bibliography, author-date) with field-level validation

format_citation_multi

Produce all citation formats for a paper at once (note, shortened note, bibliography, author-date)

Additional Style Guides

Tool

Description

format_citation_apa

Format paper metadata into APA 7th edition citations (reference list, in-text parenthetical, in-text narrative)

format_citation_apa_multi

Produce all APA 7th formats for a paper at once

format_citation_turabian

Format paper metadata into Turabian 9th edition citations (CMOS-based student style)

format_specialized_turabian

Format specialized biblical/theological citations in Turabian style

Export Formats

Tool

Description

format_csl_json

Convert paper metadata to CSL-JSON (Zotero, Mendeley, Pandoc compatible)

format_bibtex

Convert paper metadata to BibTeX/BibLaTeX format for LaTeX (with auto-generated citation keys)

format_ris

Convert a single paper to RIS format for import into Zotero, EndNote, Mendeley

assemble_ris

Convert multiple papers to a concatenated RIS file

parse_ris

Parse RIS-formatted text back into paper metadata dicts (import from any reference manager)

assemble_bibliography

Assemble a formatted bibliography from multiple papers with auto-sort (author/year) and optional grouping

In-Text Citation Customization

Tool

Description

format_citation_compound

Format multiple cites as a single in-text citation with prefix/suffix, page locators (7 types), omit-author for narrative citations, and multi-cite compounds with style-aware separators

Original Language Tools

Tool

Description

lookup_church_fathers_crossref

Search the Church Fathers scripture cross-reference index: given a Bible verse (e.g., "Matthew 5:1-12"), returns all ANF/NPNF passages that cite or comment on it with author, work, section, and surrounding text context

lookup_strongs

Look up a biblical word by Strong's number (G2316, H430, etc.) — returns lemma, transliteration, gloss, and definition

word_study

Search 202 Greek/Hebrew words by English meaning, lemma, or Strong's number

transliterate

Convert Greek or Hebrew text to SBL-style Roman characters

parse_latin_word

Parse a Latin word and return its morphological analysis via Whitaker's Words

lookup_latin

Look up a Latin or English word in Whitaker's Latin-English dictionary (~39K entries)

Catholic Reference Tools

Tool

Description

lookup_pope

Look up biographical and pontificate info for 265 popes and 41 antipopes

search_prayers

Search 56 Catholic prayers by title, origin, category, or text

get_prayer

Get a specific Catholic prayer by ID in up to 8 languages

list_prayers

List all available Catholic prayers with titles

search_church_history

Search church history timeline for theologians, councils, and events

lookup_vulgate

Look up an accented verse in the Clementine Vulgate (35,809 verses)

Specialized Biblical/Theological Sources

Tool

Description

format_specialized_citation

Format Bible, Church Fathers (ANF/NPNF/PG/PL/SC), Dead Sea Scrolls, Qur'an, Rabbinic, Vatican (23 doc types), ANE texts, papyri, commentary, dictionary, Josephus/Philo, Pseudepigrapha, Apostolic Fathers, Nag Hammadi, Loeb, Liturgical, Aquinas, and website/blog citations — in first note, shortened note, and bibliography formats

Supported specialized source types: bible, church-fathers, ancient-work, commentary, dictionary-encyclopedia, quran, rabbinic, dead-sea-scrolls, vatican, ancient-near-east, papyri, loeb, josephus-philo, pseudepigrapha, apostolic-fathers, nag-hammadi, website-blog, liturgical, aquinas

Style Knowledge Base

Tool

Description

query_style_rule

Look up any citation rule from CMOS 18, SBL 2nd ed., APA 7th, or Turabian 9th by topic keyword, with source section numbers

query_style_difference

Compare how CMOS 18 and SBL 2nd ed. differ on a specific topic

abbreviate_publisher

Look up the standard abbreviation for a publisher (64 known publishers, SBL de/suffix conventions)

abbreviate_state

Convert a US state to the proper abbreviation (USPS postal for SBL 2; traditional for CMOS 18)

abbreviate_journal

Look up the SBL-standard abbreviation for a journal name (e.g., "Journal of Biblical Literature" → "JBL") from 500+ entries

lookup_journal_abbrev

Alias for abbreviate_journal

Free/Open Source Tools (Layer 2 — available without credentials)

search_ccel download_ccel read_ccel_paper
lookup_church_fathers_crossref
search_perseus read_perseus_text
search_open_library
search_osf_preprints download_osf_preprints read_osf_preprints_paper
search_ixtheo
search_philpapers download_philpapers read_philpapers_paper
search_sefaria read_sefaria_text
search_lens read_lens_paper
search_summa_english read_summa_article
search_romanus read_romanus_paragraph
search_baltimore_catechism read_baltimore_catechism_question
search_girm read_girm_paragraph
search_newman read_newman_text
search_daily_readings
search_catholic_ontology
search_liturgical_calendar
search_google_books (activate via GOOGLE_BOOKS_API_KEY)
search_arxiv download_arxiv read_arxiv_paper
search_pubmed
search_biorxiv download_biorxiv read_biorxiv_paper
search_medrxiv download_medrxiv read_medrxiv_paper
search_iacr download_iacr read_iacr_paper
search_semantic download_semantic read_semantic_paper
search_crossref get_crossref_paper_by_doi
search_openalex read_openalex_paper
search_pmc download_pmc read_pmc_paper
search_core download_core read_core_paper
search_europepmc download_europepmc read_europepmc_paper
search_dblp
search_openaire
search_citeseerx download_citeseerx read_citeseerx_paper
search_doaj download_doaj read_doaj_paper
search_base download_base read_base_paper
search_zenodo download_zenodo read_zenodo_paper
search_hal download_hal read_hal_paper
search_ssrn download_ssrn read_ssrn_paper
search_unpaywall

AI Research Tools (Layer 2 — activate via ELICIT_API_KEY)

search_elicit read_elicit_paper
generate_research_report create_systematic_review

Institutional Tools (Layer 2 — activate via env vars)

search_jstor download_jstor read_jstor_paper
search_proquest download_proquest read_proquest_paper
search_ebscohost download_ebscohost read_ebscohost_paper
search_project_muse download_project_muse read_project_muse_paper
search_web_of_science

AI Research Tools (activate via ELICIT_API_KEY)

Tool

Description

generate_research_report

AI-synthesized literature review from top matching papers

create_systematic_review

Structured systematic review workflow on Elicit

Library & Local Search Tools

Tool

Description

build_library_index

(Re)build the local full-text search index from downloaded PDFs

search_library

Full-text search across the local library (Tantivy BM25)

hybrid_search_library

Combined BM25 + vector similarity search (Tantivy + LanceDB)

index_library_document

Index (insert or replace) a single library item

delete_library_document

Remove a single document from the local library index

parse_references

Parse raw reference strings into structured metadata via Anystyle.io

Web Research Tool (activate via FIRECRAWL_API_KEY)

web_research — scrape URLs and search the open web via Firecrawl.


Setup

Prerequisites

  • Python 3.10+

  • uv (package manager)

# Clone
git clone https://github.com/aringadre76/mcp-for-research.git
cd mcp-for-research

# Run (uv auto-installs dependencies)
uv run -m paper_search_mcp.server

opencode Configuration

The project includes .opencode/opencode.json with the MCP server already configured. Restart opencode and the tools will be available.

To enable specific sources, fill in their environment variables in .env (copy from .env.example):

cp .env.example .env

Then add your keys:

# Free API keys (improve rate limits)
PAPER_SEARCH_MCP_UNPAYWALL_EMAIL=your@email.com
PAPER_SEARCH_MCP_SEMANTIC_SCHOLAR_API_KEY=
PAPER_SEARCH_MCP_CORE_API_KEY=

# JSTOR — Shibboleth auto-login (browser opens, logs in, saves cookies)
JSTOR_USERNAME=your_username
JSTOR_PASSWORD=your_password

# Other institutional databases (EZproxy)
INSTITUTION_PROXY_URL=https://proxy.library.university.edu/login?url=

# Web research
FIRECRAWL_API_KEY=

JSTOR Setup

JSTOR requires institutional credentials (Shibboleth login). On first use, a Chrome browser window opens and automates login through your institution's SSO. Cookies are persisted to .opencode/cred/jstor-cookie.txt for subsequent PDF downloads.

  1. Place credentials in .opencode/cred/cua-credentials.txt (recommended — avoids truncation risk in JSON configs):

    JSTOR_USERNAME=your_username
    JSTOR_PASSWORD=your_password
  2. Alternatively, set them in .opencode/opencode.json under mcp.academic-research.env:

    "JSTOR_USERNAME": "your_username",
    "JSTOR_PASSWORD": "your_password"
  3. Restart the MCP server. The server auto-detects credentials from either location. On the first JSTOR search, the browser opens, logs in, and saves cookies. Subsequent searches skip login if the session is still valid.

See .opencode/JSTOR.md for implementation details and troubleshooting.

Optional Dependencies

# OCR support for scanned/image-based PDFs (Surya OCR → PyTesseract fallback)
uv sync --extra ocr

# Full dev toolchain including syrupy snapshots, mypy, ruff
uv sync --extra dev

OCR pipeline: PyMuPDF extracts text from modern PDFs. For scanned/image-based PDFs, the optional [ocr] dependency group adds Surya OCR (state-of-the-art for academic PDFs — handles equations, multi-column layout, multi-language text) with PyTesseract as a legacy fallback.

Grobid: Machine-learning library for extracting structured metadata (title, authors, abstract, references, affiliations) from academic PDFs. Requires a Docker container (docker run -p 8070:8070 lfoppiano/grobid) and the grobid-client Python package.

Anystyle.io: Ruby gem for parsing raw reference strings into structured metadata. Used by the parse_references MCP tool. Install with gem install anystyle.

Hybrid search: Full-text BM25 (Tantivy) + vector similarity (LanceDB) for searching the local downloaded-paper library. sentence-transformers provides the embedding model for vector search.

OS keychain credential storage: Institutional credentials can be stored securely in the OS keychain (macOS Keychain, Windows Credential Manager, Linux Secret Service) via Python keyring. The connect-institution.py script writes credentials to the keychain; browser-based adapters read from it at runtime.

Other MCP Clients

Claude Desktop — add to claude_desktop_config.json:

{
  "mcpServers": {
    "academic-research": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/mcp-for-research", "-m", "paper_search_mcp.server"]
    }
  }
}

Cursor.cursor/mcp.json is included in the repo.


Testing

Framework: pytest with unittest.mock for HTTP mocking (matching the project's unittest style). Snapshot testing via syrupy for citation output and API response regression. Coverage via pytest-cov.

Quick start

# Install the project with dev dependencies
uv sync --extra dev

# Run all tests (skip slow end-to-end scripts)
uv run pytest tests/ -v --ignore=tests/e2e_test.py --ignore=tests/functional_test.py

# Run with coverage report
uv run pytest tests/ --ignore=tests/e2e_test.py --ignore=tests/functional_test.py \
    --cov=paper_search_mcp --cov=institutional --cov-report=term

Test suite (~4,500+ tests, all passing)

| Category | Files | Style | Description | | |---|---|---|---| | Citation & style | test_citation_formatter.py (147), test_specialized_formatter.py (296) | Pure unit (no API) | Canonical tests for all 19 specialized source types across 3 formats (first note, shortened note, bibliography) with mode-separation proof. Covers author edge cases (11+ authors, suffixes, particles), title hazards (embedded quotes, HTML entities, 200+ chars), date anomalies (BCE, forthcoming, n.d.), pagination (Roman numerals, non-contiguous), edition strings, and full Vatican 32-type coverage. | | Scenario coverage | test_scenario_coverage.py, test_scenario_coverage_r2.py, test_scenario_coverage_r3.py, test_scenario_coverage_r4_*.py (4 files), test_scenario_coverage_r5_batch2_*.py (5 files) | Mocked (unit) | 316 multi-step workflow scenarios across 9 files: search→citation round-trips, dedup edge cases, error propagation, and 36 Catholic user workflows (priest, theologian, lay personas) exercising all 19 specialized source types, both citation styles, style knowledge base, citation graph traversal, download fallback chains, web research, publisher/state abbreviation, and institutional source graceful degradation. | | Download edge cases | test_download_edge_cases.py | Mocked (unit) | Truncated PDFs, redirect loops, 0-byte responses, disk-full, filename collisions, Sci-Hub CAPTCHA spoofing, repository fallback serial ordering, credential file parsing. | | Catholic reference tools | test_utility_tools.py (46) | Pure unit | Catholic timeline, Pope database, Latin morphology (Whitaker's Words), Vulgate lookup, and Prayers collection. | | Catholic FTS5 / Web adapters | test_fts5_adapters.py (32), test_daily_readings.py (9) | Mocked HTTP / Local DB | Test coverage for Summa Theologica, Roman Catechism, Baltimore Catechism, GIRM, Daily Readings, Catholic Ontology, and Liturgical Calendar. | | Platform unit tests | test_openalex.py, test_pmc.py, test_core.py, test_europepmc.py, test_chemrxiv.py, test_pubmed.py, test_oaipmh.py | Fully mocked HTTP | API response parsing, parameter encoding, error handling, PDF download/read flows | | Server integration | test_server_unit.py (91), test_fallback.py (16), test_server.py | Mocked searchers + async | Tool wrappers, fallback chains, source orchestration, deduplication, style knowledge base queries | | Pure logic | test_cli.py, test_config_env.py | Unit | CLI argument parsing, env var precedence, dedupe keys, filename sanitation | | Institutional adapters | test_jstor.py, test_proquest.py, test_ebscohost.py, test_project_muse.py, test_web_of_science.py, test_inst_browser.py, test_shibboleth_auth.py | Fully mocked | HTML scraping parsers, cookie persistence, browser lifecycle, auth flows | | Integration/live | test_arxiv.py, test_crossref.py, test_semantic.py, test_iacr.py, test_biorxiv.py, test_medrxiv.py, test_dblp.py, test_doaj.py, test_openaire.py, test_hal.py, test_zenodo.py, test_ssrn.py, test_base.py, test_citeseerx.py, test_google_scholar.py, test_sci_hub.py, test_unpaywall.py | Live API (skip on failure) | Smoke tests against real endpoints | | End-to-end | e2e_test.py, functional_test.py | Standalone scripts | Full search → download → read chain across all 25+ platforms |

Architecture (tests/conftest.py)

Shared fixtures: mock_session (mock requests.Session), temp_output_dir (uses tmp_path), sample_papers (list of Paper dicts).

CI

GitHub Actions workflow at .github/workflows/test.yml runs on every push and PR against Python 3.10–3.13:

# Installs uv, syncs dev deps, runs pytest with coverage
uv run pytest --cov=paper_search_mcp --cov=institutional --cov-report=term

What's covered vs. what isn't

Covered (~90%): All parsers, date formatters, DOI extractors, abstract reconstructors, filter functions, deduplication logic (including Unicode edge cases), API parameter encoders, PDF download pipelines (including truncation, redirect loops, disk-full, and non-PDF rejection), CLI argument handling, env var loading, citation formatting (all 19 source types with mode separation, author edge cases, suffix handling, name particles, edition strings, date anomalies, pagination variants), style reference query system (45+ topics, 15 difference pairs), citation graph traversal, source registry dynamics, error propagation chains, and every source adapter's core search/read/download path. Multi-step workflow scenarios tested across search→citation round-trips, download fallback chains, and deduplication metadata integrity.

Newly covered since last baseline:

  • L3 Playwright browser automation (JSTOR Shibboleth login, PerimeterX detection, cookie persistence) — tested via HAR replay in CI, live Playwright locally (tests/browser/)

  • External API error branches — VCR cassette fixtures (tests/fixtures/vcr/) record/replay real HTTP responses for 404, 500, timeout, and fallback chain exhaustion

  • Property-based invariants — Hypothesis generative testing for citation formatter correctness across random inputs (tests/property/)

  • MCP security — BibTeX/LaTeX injection prevention, tool name collision detection (tests/security/)

  • Snapshot regression testing — Syrupy snapshot tests for citation output and API response format stability (tests/snapshots/)

Not covered (~10%):

  • Live Playwright sessions in CI — browser tests replay via HAR; full live browser runs require local credentials

  • Multi-mirror Sci-Hub rotation — tested with one mirror; rotation logic needs multi-endpoint live test

  • Feature gaps documented in tests as # DEFERRED PHASE 3 or # NOT A GAP: multi-language citation conventions, cross-form consistency validation, citation graph→formatter pipeline, per-format error diagnosis, RIS export. Several former gaps (CSL-JSON, BibTeX, CLI cite, DOI metadata completion, title-case normalization, accessibility output) are now implemented.

See pyproject.toml for [tool.pytest.ini_options] and [tool.coverage.*] configuration.


Extending: Adding a New Source

  1. Create a Python file implementing the PaperSource ABC (search, download_pdf, read_paper).

  2. Place it in institutional/ (for proxied databases) or paper_search_mcp/academic_platforms/ (for API-based sources).

  3. Register it in paper_search_mcp/server.py:

    • Import and instantiate the searcher

    • Add it to ALL_SOURCES

    • Add a dispatch case in search_papers

    • Register @mcp.tool() functions for search/download/read

Example adapters: institutional/jstor.py (proxy scraping), paper_search_mcp/academic_platforms/semantic.py (REST API).


Roadmap

Phase 1 — Complete ✓

  • 22 free/open source adapters (paper-search-mcp base)

  • 5 institutional database adapters (JSTOR, ProQuest, EBSCOhost, Project MUSE, WoS)

  • Firecrawl web research integration

  • opencode MCP configuration

  • Deduplication (DOI → title → paper_id)

  • OA-first fallback download chain

Phase 2 — In Progress

  • Citation graph traversal (multi-hop via Semantic Scholar)

  • CMOS 18 / SBL 2nd edition citation formatting (first note, shortened note, bibliography for all 19 specialized source types)

  • Specialized biblical/theological source formatting (19 source types, 23 Vatican doc types, 6 canon law variants)

  • Style knowledge base with queryable rules (45+ topics, 15 difference pairs)

  • Catholic liturgical and Aquinas/scholastic citation support

  • Known-edition defaults (ANF/NPNF → Hendrickson, PG/PL → Migne, SC → Cerf, CIC → Vaticana)

  • Latin/English title duality for classical and patristic works

  • Robustness hardening: incomplete-metadata annotations, empty-field guards, multi-note chain testing

  • 300 tests, zero failures, all categories ≥ 92/100 confidence

  • Code quality: shared source_registry module, ruff linting (0 issues), mypy type checking (CI), credential parsing consolidation

  • CI pipeline with ruff + mypy + pytest across Python 3.10–3.13

  • Server refactoring: tool factory from source registry (60-line if/elif → 8-line loop, ~800 lines eliminated)

  • Table-driven search_papers dispatch via ALL_SOURCE_INFOS registry

  • Auto-generated search/download/read tools for all 25+ sources (consistent error handling)

  • Missing standalone download/read MCP tools added (PMC, CORE, EuropePMC, OpenAlex — previously only search)

  • download_with_fallback uses live searcher registry instead of hardcoded downloader dict

  • JSTOR PDF download: browser fallback when curl_cffi hits PerimeterX (authenticated Playwright session)

  • Test expansion: 1,683+ tests (up from 1,207), 316 multi-step scenario workflows across 9 scenario coverage files (3 rounds + 9 R4/R5 batch files covering 36 Catholic user workflows)

  • Bug fixes: edition double-suffix, suffix inversion (Jr./III), single-chapter Bible verses, name particles (van Gogh), path traversal validation, _enable_source registry sync, format_citation_multi exception type

  • 7 new specialized source adapters: CCEL Church Fathers (38-volume ANF/NPNF, local search index), Perseus (120 Greek/Latin classical works), Google Books (API-key gated), Open Library (30M+ free catalog), OSF Preprints (30+ preprint servers), IxTheo (3M+ theology records), PhilPapers (philosophy + PhilArchive full-text)

  • APA 7th edition citation formatting (reference list + in-text parenthetical/narrative for books, journals, chapters)

  • Turabian 9th edition citation formatting (CMOS 18 wrapper with student-paper overrides)

  • CSL-JSON output format (Zotero/Mendeley/Pandoc interoperable)

  • BibTeX/BibLaTeX output format (LaTeX-ready with proper escaping and page-range formatting)

  • Original language tools: Strong's dictionary lookup (202 Greek/Hebrew entries), word study search, SBL-style Greek/Hebrew transliteration

  • Style knowledge base expanded: APA 7th (9 topics), Turabian 9th (6 topics)

  • Church Fathers scripture cross-reference index (Bible verse → patristic commentary)

  • RIS export/import (single + batch + parse) with 30+ type code mapping

  • CSL engine integration (citeproc-py + local style cache, 3 bundled styles)

  • Batch bibliography assembly with auto-sort (author/year) and grouping

  • Journal abbreviation database (500+ entries from SBLHS §8.4) + lookup tools

  • In-text citation customization: compound citations with prefix/suffix, page locators (7 types), multi-cite, narrative-citation omit-author

  • Tiptap rich text editor (ProseMirror-based) with citation @mentions in web dashboard. ⚠️ Yjs collaboration hook (useYjsCollaboration.js) exists but is NOT wired to the Tiptap editor — collaborative document sync is pending.

  • Yjs CRDT-based collaborative editing via WebSocket — ⚠️ useYjsCollaboration.js hook written but not integrated with Tiptap. Presence + messaging (WebSocket) is implemented; CRDT-based real-time document sync is pending.

  • Hybrid BM25 + vector search (Tantivy + LanceDB) for the local paper library

  • Surya OCR + PyTesseract fallback for scanned PDFs — ⚠️ Surya integration code exists but the API is broken against the installed package version. PyTesseract fallback works.

  • Grobid structured metadata extraction from PDFs — ⚠️ Integration code exists but is never called in any execution path; grobid-client dependency not listed in pyproject.toml.

  • Anystyle.io reference string parser

  • OS keychain credential storage (macOS Keychain, Windows Credential Manager, Linux Secret Service)

  • Sefaria open Jewish texts corpus adapter (Tanakh, Talmud, Midrash, commentaries)

  • Elicit AI research assistant adapter (semantic search, literature reviews, systematic reviews)

  • Lens.org scholarly works + patents search adapter

  • Catholic reference tools (Latin, popes, prayers, timeline, Vulgate)

  • Additional Catholic source adapters (Summa, Roman Catechism, Baltimore Catechism, GIRM, Daily Readings, Ontology)

  • FTS5 indexer base class refactoring

  • Accordance Bible software integration

  • Anna's Archive book search integration (as optional MCP tool)

  • Research report generation templates

  • Saved search alerts and recurring queries

  • Web dashboard for credential management and report browsing — ⚠️ Partially done: Search, Library, Citations, Projects, Credentials, and Notes pages functional. PDF viewer (Phase 1) working. Backend deployed on Railway. Some backend routes (annotations, PDF proxy, Notes POST) still missing. See docs/PROJECT_STATUS_AND_REMAINING_WORK.md.

  • Hosted server deployed on Railway (JWT auth, encrypted credential vault, rate limiting, usage tracking, admin dashboard). ⚠️ Retained institutional logins: cookie persistence works for JSTOR; other adapters may need re-authentication after session expiry.

Phase 3 — Future

  • Multi-user research sessions

  • Direct Zotero/EndNote export

  • Pre-indexed vector embeddings for LanceDB hybrid search

  • Advanced RAG (retrieval-augmented generation) over the paper library


Credits

Built on paper-search-mcp by P.S. Zhang / openags (MIT License). The core free/open source adapters and two-layer architecture originate from that project.

Institutional database adapters, Firecrawl integration, opencode configuration, and project packaging by Jacob Gruber.


License

MIT — see LICENSE.

A
license - permissive license
-
quality - not tested
B
maintenance

Maintenance

Maintainers
Response time
Release cycle
Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jacobhgruber-dev/academic-research-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server