Academic Research MCP
Provides tools for searching, downloading, and reading papers from arXiv, a repository of electronic preprints in various scientific fields.
Provides tools for searching, downloading, and reading papers from Google Scholar, a freely accessible web search engine indexing scholarly literature across disciplines.
Provides tools for searching, downloading, and reading papers from PubMed, a free search engine accessing the MEDLINE database of life sciences and biomedical literature.
Provides tools for searching, downloading, and reading papers from Semantic Scholar, an AI-powered research tool for scientific literature.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Academic Research MCPSearch for recent papers on CRISPR gene editing and export as BibTeX"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Academic Research MCP
Unified scholarly search across 60+ academic sources, with a 13,000-document Catholic theological corpus (12 popes, 94 patristic/magisterial sources, 434K cross-references), CMOS 18/SBL 2nd/APA 7th/Turabian 9th citation formatting, 19 specialized source types, CSL-JSON/BibTeX export, and original language tools. Includes 150+ MCP tools, hybrid BM25+vector search, AI-powered research via Elicit, and collaborative rich-text editing (Tiptap + Yjs).
Built on paper-search-mcp by openags (MIT licensed), extended with institutional database connectors, Firecrawl web research, citation graph traversal, and dual-style citation formatting with mode-separated CMOS 18 and SBL 2 output.
Vision
A single interface for all academic research. Instead of bouncing between JSTOR, ProQuest, EBSCOhost, Google Scholar, and a dozen other sites — each with its own login — you describe what you're looking for and the MCP searches everything at once, deduplicates results, downloads accessible PDFs, and extracts full text for the AI to analyze.
Phase 1 (now): MCP server — search, download, read across all sources from within opencode or any MCP client. Phase 2: Web dashboard — browse saved papers, manage credentials, generate and export research reports. Phase 3: Hosted server — access from anywhere, retained institutional logins, shared research sessions.
The MCP server is the engine. Everything else is a client that talks to it.
Related MCP server: Academic MCP Server
Architecture
┌─────────────────────────────────────────────────────────────┐
│ MCP CLIENTS │
│ opencode · Claude Desktop · Cursor · VS Code · Web │
└────────────────────────┬────────────────────────────────────┘
│ JSON-RPC (stdio / Streamable HTTP)
┌────────────────────────▼────────────────────────────────────┐
│ ACADEMIC RESEARCH MCP │
│ │
│ ┌──────────────────┐ ┌──────────────────────────────────┐ │
│ │ paper_search_mcp │ │ institutional/ │ │
│ │ (23 free/open │ │ (5 institutional adapters) │ │
│ │ source adapters)│ │ │ │
│ │ │ │ jstor.py proquest.py │ │
│ │ arxiv.py │ │ ebscohost.py project_muse.py │ │
│ │ pubmed.py │ │ web_of_science.py │ │
│ │ semantic.py │ └──────────────────────────────────┘ │
│ │ citation_graph.py│ │
│ │ crossref.py │ │
│ │ openalex.py │ ┌──────────────────────────────────┐ │
│ │ citation_formatter│ │ Firecrawl web_research (optional)│ │
│ │ specialized_format│ └──────────────────────────────────┘ │
│ │ style_reference.py│ │
│ │ ... (16 more) │ │
│ └──────────────────┘ │
│ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Citation & Style Layer │ │
│ │ │ │
│ │ citation_formatter.py — Books, journals, chapters, │ │
│ │ edited/translated books (5 formats per type) │ │
│ │ specialized_formatter.py — Bible, Church Fathers, DSS, │ │
│ │ Qur'an, Rabbinic, Vatican (23 doc types), ANE, │ │
│ │ Papyri, Loeb, Josephus/Philo, Pseudepigrapha, │ │
│ │ Apostolic Fathers, Nag Hammadi, Commentary, │ │
│ │ Dictionary, Liturgical, Aquinas, Website/Blog │ │
│ │ style_reference.py — Queryable CMOS 18 / SBL 2 │ │
│ │ knowledge base (45+ topics, 15 difference pairs) │ │
│ └────────────────────────────────────────────────────────┘ │
│ │
│ Two-layer design: │
│ Layer 1: search_papers() — concurrent multi-source search │
│ with deduplication (DOI → title → paper_id) │
│ Layer 2: Per-source search/download/read tools │
│ download_with_fallback() — OA-first fallback chain │
│ Layer 3: Citation formatting — CMOS 18 & SBL 2nd ed. │
│ Notes, bibliographies, author-date, specialized │
│ │
│ Search Indexes (local): │
│ Tantivy (BM25) + LanceDB (vector) hybrid search │
│ Surya OCR → PyTesseract fallback for scanned PDFs │
│ Grobid structured metadata extraction from PDFs │
└─────────────────────────────────────────────────────────────┘Every source adapter implements the same PaperSource interface (search, download_pdf, read_paper). Adding a new source takes one Python file and a few lines of registration — no architectural changes.
Citation & Style
The MCP includes a full citation formatting system supporting two major academic style guides with mode-separated output verified against the published manuals:
Style | Version | Source |
CMOS 18 | The Chicago Manual of Style, 18th ed. (2024) | Chapters 13–14 |
SBL 2 | The SBL Handbook of Style, 2nd ed. (2014) | Chapters 4, 6, 8; Student Supplement |
Each mode produces distinct, correct output. CMOS 18 defers to SBL for biblical/ancient Near Eastern conventions (as the manual itself recommends). SBL 2 defers to CMOS for general formatting (philosophy, psychology, comparative religions). The tool enforces mode-appropriate formatting throughout:
Rule | CMOS 18 | SBL 2 |
Place of publication | Omitted for post-1900 books (14.30) | Required for all books (6.1.4.2) |
Publisher names | Full names, strip Inc./Ltd. only (14.32) | Abbreviated per SBLHS 6.1.4.1 (64 publishers) |
State abbreviations | Traditional (Mass., N.Y., Calif.) | USPS postal (MA, NY, CA) (8.1.1) |
Bible abbreviations | Periods: Gen., Exod. (14.138) | No periods: Gen, Exod (8.2) |
Series/journals | Full names | Abbreviated in notes AND bib (6.1.3.5) |
3-em dash in bibs | Discontinued (13.72) | Still used (6.2) |
BCE/CE format | B.C.E., C.E. | BCE, CE (8.1.2) |
Standard source types
Books, journal articles, chapters in edited volumes, edited books, and translated books — in five citation formats: full note, shortened note, bibliography, author-date reference list, and author-date parenthetical citation.
Specialized source types
All 19 specialized source types support three citation formats: full note ("note"), shortened note for second appearance onwards ("note_short"), and bibliography entry ("bibliography"). Primary/ancient sources (Bible, DSS, Josephus/Philo, Pseudepigrapha, Apostolic Fathers, Nag Hammadi, ANE texts, Rabbinic, Qur'an, papyri) return an empty bibliography string — cite the modern edition as a standard book instead. More than 300 exact-output tests verify correctness across all formats and both styles.
Notable formatting details across all formats:
En dashes in page ranges (
95–96, not95-96)Commas inside quotation marks per American style (
"Title," in *Journal*)CMOS 18 omits place of publication for post-1900 books; SBL 2 requires it
Only first author inverted in bibliographies; comma before "and" for two authors
Publisher abbreviation per SBL conventions (64 known publishers)
State abbreviation per style (USPS postal for SBL 2; traditional for CMOS 18)
Specialized biblical / theological sources
Nineteen dedicated formatters handle primary-source citation patterns not covered by standard book/journal rules. All support three formats: full first note, shortened note (second appearance onwards), and bibliography. Examples below — SBL short-note and bibliography shown where they differ from the first note:
Source type | SBL first note | SBL short note | SBL bibliography |
Bible |
|
| — (cite edition as book) |
Church Fathers |
|
|
|
Josephus / Philo |
|
| — (cite Loeb edition as book) |
Commentary |
|
|
|
Dictionary |
|
|
|
Dead Sea Scrolls |
|
| — |
Pseudepigrapha |
|
| — |
Apostolic Fathers |
|
| — |
Nag Hammadi |
| Same | — |
ANE texts |
| Same | — |
Papyri |
| Same | — |
Qur'an |
| Same | — |
Rabbinic |
| Same | — |
Website / Blog |
|
|
|
Loeb |
|
|
|
Vatican (encyclical) |
|
|
|
Liturgical |
|
|
|
Aquinas |
|
|
|
Latin vs. English conventions: SBL 2 §8.3.6 and §8.3.14.3 state that Latin abbreviations are preferred in notes for classical and patristic works, but English titles may also be used if consistent. The formatter defaults to Latin abbreviations in SBL mode (Ant., Conf., Haer., ST) and English full titles in CMOS mode (Jewish Antiquities, Confessions, Adversus Haereses, Summa Theologiae). Users can override by providing either work_abbrev or work_title in the paper dict. For Philo and Josephus specifically, "no priority is intended by our listing of the Latin titles first; authors should decide which they prefer and remain consistent" (SBL 2 §8.3.6).
Vatican document type support
The Vatican formatter handles 32 document types across the full authority hierarchy, including curial types (instruction, decree, declaration, note, notification, response). It also extracts magisterial_type (ordinary/extraordinary) and infallible fields when present. All documents support first note, shortened note, and bibliography. Authoritative documents (encyclicals, constitutions, motu proprios) produce bibliography entries; oral communications (audiences, homilies) produce note-format references only. SBL 2 mode omits URLs for Vatican documents per general SBL conventions.
Authority level | Types | Example (SBL first note → SBL short) |
Solemn papal | apostolic constitution, dogmatic constitution, encyclical |
|
Major teaching | apostolic exhortation, apostolic letter, motu proprio |
|
Conciliar | pastoral constitution, decree, declaration |
|
Curial | instruction, rescript | Cite by issuing body, title, date, section |
Oral / pastoral | general audience, homily, address, speech, allocution, angelus, regina caeli, message |
|
Reference | catechism (by paragraph), canon law (CIC, CCEO, CIC/1917) |
|
Canon law bibliography dynamically resolves edition details: CIC (Libreria Editrice Vaticana, 1983), CIC/1917 (P. J. Kenedy & Sons, 1918), CCEO (Libreria Editrice Vaticana, 1990). User-supplied publisher/place/year overrides defaults.
Known-edition defaults
To prevent incomplete bibliography entries, the formatter fills missing publisher/place/year from a known-edition table when the series is identified:
Series | Default publisher | Default place | Default year |
ANF | Hendrickson | Peabody, MA | 1994 |
NPNF1 / NPNF2 | Hendrickson | Peabody, MA | 1994 |
PG | J.-P. Migne | Paris | 1857–1866 |
PL | J.-P. Migne | Paris | 1844–1864 |
SC (Sources Chrétiennes) | Cerf | Paris | 1941– |
When critical metadata cannot be filled, a [Note: incomplete edition data] annotation is appended to the bibliography entry rather than silently producing an incomplete citation.
Aquinas / scholastic philosophy
Supports all major works of Thomas Aquinas with SBL abbreviation and CMOS full-title modes:
Work | Abbrev | SBL example |
Summa Theologiae | ST |
|
Summa Contra Gentiles | SCG |
|
De Ente et Essentia | De ente |
|
Catena Aurea | Cat. aur. |
|
In Metaphysicam | In Meta. |
|
Compendium Theologiae | Comp. theol. |
|
Catholic liturgical texts
Eighteen liturgical book names recognized (including Latin variants), with abbreviation support. The Roman Missal supports 13 canonical edition IDs (mr1474–mr2008) with long-form reverse lookup:
Roman Missal / Missale Romanum → RM
Lectionary for Mass / Ordo Lectionum Missae
Liturgy of the Hours / Divine Office / Liturgia Horarum → LH
Roman Ritual / Rituale Romanum
Roman Pontifical / Pontificale Romanum
General Instruction of the Roman Missal → GIRM / IGMR (Institutio Generalis Missalis Romani)
Roman Gradual / Graduale Romanum
Ceremonial of Bishops / Caeremoniale Episcoporum
Verification system
Every format_citation call can return a diagnostic report with:
Errors: missing required fields (e.g., no title)
Warnings: gentle hints about optional but useful fields
Style notes: decisions the formatter made (e.g., "Publisher abbreviated to Fortress per SBLHS 6.1.4.1")
Style knowledge base
The query_style_rule tool answers natural-language questions like "how do I cite the Mishnah?" or "what's the abbreviation for Dead Sea Scrolls?" by looking up structured rules in style_reference.py. Coverage includes:
CMOS 18 (21 topics): citation systems, notes, bibliographies, authors, titles, place of publication, publisher formatting, page numbers, editions, book/journal/chapter formats, author-date, punctuation, scriptural references, classical references, ancient/specialized sources, publisher abbreviations, state abbreviations, philosophy (ancient, medieval, modern, comparative), psychology, comparative religions, Catholic theology (Vatican documents, catechism, canon law, audiences/homilies)
SBL 2 (24 topics): relationship to CMOS, notes, bibliographies, place of publication, publisher names (64 known), series/journal abbreviations, biblical citations (all OT/NT/Deuterocanonical), commentaries, dictionaries/encyclopedias, electronic sources, transliteration, term paper formatting, BCE/CE rules, small caps, ancient sources (Hebrew Bible, NT, deuterocanonical, pseudepigrapha, DSS, Philo, Josephus, Mishnah/Talmud/Rabbinic, Targumic, Apostolic Fathers, Nag Hammadi, classical Christian, ANE editions, papyri/ostraca, Greek magical papyri, Church Fathers series, Loeb, Strack-Billerbeck, ANRW, SBL Seminar Papers, Bible versions, Migne Patrologia, Vatican documents including 23 document types, Qur'anic/Islamic), abbreviation authorities, electronic sources, commentary series (AYB, WBC, ICC, Hermeneia, NICOT/NICNT, SP, BNTC, etc.), dictionary series (ABD, TDNT, TDOT, NCE, etc.), philosophy, psychology of religion, comparative religions (Hindu, Buddhist, Sikh, Confucian/Daoist, Zoroastrian), Catholic theology (catechism, conciliar, papal, canon law, SC series, CCCS, NCE), Sources Chrétiennes
Cross-style differences (15 entries): place of publication, publisher names, state abbreviations, 3-em dash, series/journal abbreviations, BCE/CE format, Bible version abbreviations, footnote shortening, bibliography page formatting, publisher/place parentheses, philosophy (ancient), Vatican documents, comparative religions sacred texts, psychology citations, commentary series
Every rule keyed to its CMOS 18 or SBL 2 section number.
Capability Matrix
Open-Access Sources (Free — No Credentials Required)
Source | Search | Download | Read | Notes |
arXiv | ✓ | ✓ | ✓ | Preprints in physics, math, CS |
PubMed | ✓ | -- | -- | Biomedical metadata; no direct PDF |
PubMed Central (PMC) | ✓ | ✓ (OA) | ✓ (OA) | Open-access biomedical full-text |
Europe PMC | ✓ | ✓ (OA) | ✓ (OA) | OA biomedical full-text |
bioRxiv | ✓ | ✓ | ✓ | Biology preprints (last 30 days) |
medRxiv | ✓ | ✓ | ✓ | Medical preprints (last 30 days) |
Semantic Scholar | ✓ | ✓ (OA) | ✓ (OA) | 200M+ papers, citation graph |
CrossRef | ✓ | -- | -- | DOI-centric metadata, 150M+ records |
OpenAlex | ✓ | -- | -- | 250M+ works, free API |
IACR ePrint | ✓ | ✓ | ✓ | Cryptography preprints |
CORE | ✓ | ✓¹ | ✓¹ | OA repository aggregator |
dblp | ✓ | -- | -- | Computer science bibliography |
OpenAIRE | ✓ | -- | -- | European research information |
ChemRxiv | ✓ | ✓¹ | ✓¹ | Chemistry preprints via CrossRef |
CiteSeerX | ✓ | ✓¹ | ✓¹ | CS digital library |
DOAJ | ✓ | ✓¹ | ✓¹ | Directory of OA journals |
BASE | ✓ | ✓¹ | ✓¹ | Bielefeld Academic Search Engine |
Zenodo | ✓ | ✓¹ | ✓¹ | OA repository |
HAL | ✓ | ✓¹ | ✓¹ | French OA archive |
SSRN | ✓ | ✓¹ | ✓¹ | Social sciences preprint server |
Unpaywall | ✓² | -- | -- | OA status for any DOI |
Lens.org | ✓ | -- | -- | Scholarly works + patents from CrossRef, PubMed, OpenAIRE. Free public API, optional API token for higher rate limits |
¹ Record-dependent (only when the source exposes a direct PDF link)
² DOI lookup only (not keyword search)
Specialized Corpora (Free — No Credentials Required)
Source | Search | Download | Read | Notes |
CCEL Church Fathers | ✓ | ✓ | ✓ | 38-volume Schaff ANF/NPNF collection (Apostolic Fathers through Seven Ecumenical Councils). Local search index + live CCEL downloads. Scripture cross-reference index (Bible verse → patristic commentary) via |
Perseus Digital Library | ✓ | -- | ✓ | 120 major Greek/Latin classical works (Homer through Boethius). Scaife Viewer integration via CTS URN |
Open Library | ✓ | -- | -- | Internet Archive book catalog. 30M+ books, free no-key API |
OSF Preprints | ✓ | ✓ | ✓ | 30+ preprint servers (SocArXiv, PsyArXiv, Thesis Commons, engrXiv, etc.). JSON:API |
IxTheo | ✓ | -- | -- | Index Theologicus — 3M+ theology records from U. Tübingen. Solr/SRU dual-path |
PhilPapers | ✓ | ✓ | ✓ | Premier philosophy bibliography + full-text via PhilArchive. JSON API |
Sefaria | ✓ | -- | ✓ | Comprehensive open-access Jewish texts corpus (Tanakh, Talmud, Midrash, commentaries, Kabbalah, liturgy). Bilingual Hebrew/English text via public REST API. Connection graphs and name autocomplete tools |
Summa Theologica | ✓ | -- | ✓ | English translation (Benziger 1947), 1,761 articles. Local FTS5 full-text index built on first use. |
Roman Catechism | ✓ | -- | ✓ | Council of Trent (1566) catechism, ~2,230 paragraphs. Local FTS5 full-text index built on first use. |
Baltimore Catechism | ✓ | -- | ✓ | No. 2 edition (1885), 421 Q&A pairs. Local FTS5 full-text index built on first use. |
GIRM | ✓ | -- | ✓ | General Instruction of the Roman Missal, 399 paragraphs. Local FTS5 full-text index built on first use. |
John Henry Newman | ✓ | -- | ✓ | 58 works, 964 chapters from newmanreader.org (the gold standard Newman source). Full-text search across complete works with CMOS citation metadata (publisher, place, year, impression, page ranges). Letters & Diaries: 14/32 volumes indexed. Local SQLite FTS5 index at |
Daily Readings | ✓ | -- | -- | Daily Mass readings from USCCB via catholic-mass-readings API (no credentials required). |
Catholic Ontology | ✓ | -- | -- | SPARQL search against 120K-triple Catholic Semantic Canon OWL ontology. |
Liturgical Calendar | ✓ | -- | -- | Feasts, seasons, colors, readings by year/country via Liturgical Calendar API. |
Premium Sources (Free API Key Required)
Source | Search | Download | Read | Notes |
Google Books | ✓ | -- | -- | Book search. Activate with |
IEEE Xplore | ✓³ | -- | -- | Skeleton — activate with |
ACM Digital Library | ✓³ | -- | -- | Skeleton — activate with |
³ Searches metadata; full-text requires institutional access beyond API key
AI-Powered Research Tools (Free API Key Required)
Source | Search | Read | Report | Notes |
Elicit | ✓ | ✓ | ✓ | AI research assistant with semantic search across 138M+ papers. Generate structured literature reviews and systematic review workflows. Activate with |
Institutional Databases (Proxy/Login Required)
Source | Search | Download | Read | Activation |
JSTOR | ✓ | ✓ | ✓ |
|
ProQuest | ✓ | ✓ | ✓ |
|
EBSCOhost | ✓ | ✓ | ✓ |
|
Project MUSE | ✓ | ✓ | ✓ |
|
Web of Science | ✓ | -- | -- |
|
Institutional adapters auto-activate when their environment variables are set. No code changes needed.
Web Research (Optional)
Source | Search | Scrape | Notes |
Firecrawl | ✓ | ✓ | Web search + page scraping. Requires |
Discovery Fallback
Source | Search | Notes |
Google Scholar | ✓ | Optional. Set |
Sci-Hub | -- | Optional. User-responsibility; disabled by default. |
MCP Tools
Unified (Layer 1)
Tool | Description |
| Concurrent search across all enabled sources with deduplication |
| OA-first multi-step download chain: source-native → repository discovery → Unpaywall → optional Sci-Hub |
| Multi-hop citation graph traversal via Semantic Scholar (BFS, deduplication, configurable depth) |
Citation & Style (Layer 3)
Standard Sources (books, journals, chapters)
Tool | Description |
| Format paper metadata into CMOS 18 / SBL 2nd ed. citations (note, bibliography, author-date) with field-level validation |
| Produce all citation formats for a paper at once (note, shortened note, bibliography, author-date) |
Additional Style Guides
Tool | Description |
| Format paper metadata into APA 7th edition citations (reference list, in-text parenthetical, in-text narrative) |
| Produce all APA 7th formats for a paper at once |
| Format paper metadata into Turabian 9th edition citations (CMOS-based student style) |
| Format specialized biblical/theological citations in Turabian style |
Export Formats
Tool | Description |
| Convert paper metadata to CSL-JSON (Zotero, Mendeley, Pandoc compatible) |
| Convert paper metadata to BibTeX/BibLaTeX format for LaTeX (with auto-generated citation keys) |
| Convert a single paper to RIS format for import into Zotero, EndNote, Mendeley |
| Convert multiple papers to a concatenated RIS file |
| Parse RIS-formatted text back into paper metadata dicts (import from any reference manager) |
| Assemble a formatted bibliography from multiple papers with auto-sort (author/year) and optional grouping |
In-Text Citation Customization
Tool | Description |
| Format multiple cites as a single in-text citation with prefix/suffix, page locators (7 types), omit-author for narrative citations, and multi-cite compounds with style-aware separators |
Original Language Tools
Tool | Description |
| Search the Church Fathers scripture cross-reference index: given a Bible verse (e.g., "Matthew 5:1-12"), returns all ANF/NPNF passages that cite or comment on it with author, work, section, and surrounding text context |
| Look up a biblical word by Strong's number (G2316, H430, etc.) — returns lemma, transliteration, gloss, and definition |
| Search 202 Greek/Hebrew words by English meaning, lemma, or Strong's number |
| Convert Greek or Hebrew text to SBL-style Roman characters |
| Parse a Latin word and return its morphological analysis via Whitaker's Words |
| Look up a Latin or English word in Whitaker's Latin-English dictionary (~39K entries) |
Catholic Reference Tools
Tool | Description |
| Look up biographical and pontificate info for 265 popes and 41 antipopes |
| Search 56 Catholic prayers by title, origin, category, or text |
| Get a specific Catholic prayer by ID in up to 8 languages |
| List all available Catholic prayers with titles |
| Search church history timeline for theologians, councils, and events |
| Look up an accented verse in the Clementine Vulgate (35,809 verses) |
Specialized Biblical/Theological Sources
Tool | Description |
| Format Bible, Church Fathers (ANF/NPNF/PG/PL/SC), Dead Sea Scrolls, Qur'an, Rabbinic, Vatican (23 doc types), ANE texts, papyri, commentary, dictionary, Josephus/Philo, Pseudepigrapha, Apostolic Fathers, Nag Hammadi, Loeb, Liturgical, Aquinas, and website/blog citations — in first note, shortened note, and bibliography formats |
Supported specialized source types: bible, church-fathers, ancient-work, commentary, dictionary-encyclopedia, quran, rabbinic, dead-sea-scrolls, vatican, ancient-near-east, papyri, loeb, josephus-philo, pseudepigrapha, apostolic-fathers, nag-hammadi, website-blog, liturgical, aquinas
Style Knowledge Base
Tool | Description |
| Look up any citation rule from CMOS 18, SBL 2nd ed., APA 7th, or Turabian 9th by topic keyword, with source section numbers |
| Compare how CMOS 18 and SBL 2nd ed. differ on a specific topic |
| Look up the standard abbreviation for a publisher (64 known publishers, SBL de/suffix conventions) |
| Convert a US state to the proper abbreviation (USPS postal for SBL 2; traditional for CMOS 18) |
| Look up the SBL-standard abbreviation for a journal name (e.g., "Journal of Biblical Literature" → "JBL") from 500+ entries |
| Alias for |
Free/Open Source Tools (Layer 2 — available without credentials)
search_ccel download_ccel read_ccel_paperlookup_church_fathers_crossrefsearch_perseus read_perseus_textsearch_open_librarysearch_osf_preprints download_osf_preprints read_osf_preprints_papersearch_ixtheosearch_philpapers download_philpapers read_philpapers_papersearch_sefaria read_sefaria_textsearch_lens read_lens_papersearch_summa_english read_summa_articlesearch_romanus read_romanus_paragraphsearch_baltimore_catechism read_baltimore_catechism_questionsearch_girm read_girm_paragraphsearch_newman read_newman_textsearch_daily_readingssearch_catholic_ontologysearch_liturgical_calendarsearch_google_books (activate via GOOGLE_BOOKS_API_KEY)search_arxiv download_arxiv read_arxiv_papersearch_pubmedsearch_biorxiv download_biorxiv read_biorxiv_papersearch_medrxiv download_medrxiv read_medrxiv_papersearch_iacr download_iacr read_iacr_papersearch_semantic download_semantic read_semantic_papersearch_crossref get_crossref_paper_by_doisearch_openalex read_openalex_papersearch_pmc download_pmc read_pmc_papersearch_core download_core read_core_papersearch_europepmc download_europepmc read_europepmc_papersearch_dblpsearch_openairesearch_citeseerx download_citeseerx read_citeseerx_papersearch_doaj download_doaj read_doaj_papersearch_base download_base read_base_papersearch_zenodo download_zenodo read_zenodo_papersearch_hal download_hal read_hal_papersearch_ssrn download_ssrn read_ssrn_papersearch_unpaywall
AI Research Tools (Layer 2 — activate via ELICIT_API_KEY)
search_elicit read_elicit_papergenerate_research_report create_systematic_review
Institutional Tools (Layer 2 — activate via env vars)
search_jstor download_jstor read_jstor_papersearch_proquest download_proquest read_proquest_papersearch_ebscohost download_ebscohost read_ebscohost_papersearch_project_muse download_project_muse read_project_muse_papersearch_web_of_science
AI Research Tools (activate via ELICIT_API_KEY)
Tool | Description |
| AI-synthesized literature review from top matching papers |
| Structured systematic review workflow on Elicit |
Library & Local Search Tools
Tool | Description |
| (Re)build the local full-text search index from downloaded PDFs |
| Full-text search across the local library (Tantivy BM25) |
| Combined BM25 + vector similarity search (Tantivy + LanceDB) |
| Index (insert or replace) a single library item |
| Remove a single document from the local library index |
| Parse raw reference strings into structured metadata via Anystyle.io |
Web Research Tool (activate via FIRECRAWL_API_KEY)
web_research — scrape URLs and search the open web via Firecrawl.
Setup
Prerequisites
Python 3.10+
uv (package manager)
# Clone
git clone https://github.com/aringadre76/mcp-for-research.git
cd mcp-for-research
# Run (uv auto-installs dependencies)
uv run -m paper_search_mcp.serveropencode Configuration
The project includes .opencode/opencode.json with the MCP server already configured. Restart opencode and the tools will be available.
To enable specific sources, fill in their environment variables in .env (copy from .env.example):
cp .env.example .envThen add your keys:
# Free API keys (improve rate limits)
PAPER_SEARCH_MCP_UNPAYWALL_EMAIL=your@email.com
PAPER_SEARCH_MCP_SEMANTIC_SCHOLAR_API_KEY=
PAPER_SEARCH_MCP_CORE_API_KEY=
# JSTOR — Shibboleth auto-login (browser opens, logs in, saves cookies)
JSTOR_USERNAME=your_username
JSTOR_PASSWORD=your_password
# Other institutional databases (EZproxy)
INSTITUTION_PROXY_URL=https://proxy.library.university.edu/login?url=
# Web research
FIRECRAWL_API_KEY=JSTOR Setup
JSTOR requires institutional credentials (Shibboleth login). On first use, a Chrome browser window opens and automates login through your institution's SSO. Cookies are persisted to .opencode/cred/jstor-cookie.txt for subsequent PDF downloads.
Place credentials in
.opencode/cred/cua-credentials.txt(recommended — avoids truncation risk in JSON configs):JSTOR_USERNAME=your_username JSTOR_PASSWORD=your_passwordAlternatively, set them in
.opencode/opencode.jsonundermcp.academic-research.env:"JSTOR_USERNAME": "your_username", "JSTOR_PASSWORD": "your_password"Restart the MCP server. The server auto-detects credentials from either location. On the first JSTOR search, the browser opens, logs in, and saves cookies. Subsequent searches skip login if the session is still valid.
See .opencode/JSTOR.md for implementation details and troubleshooting.
Optional Dependencies
# OCR support for scanned/image-based PDFs (Surya OCR → PyTesseract fallback)
uv sync --extra ocr
# Full dev toolchain including syrupy snapshots, mypy, ruff
uv sync --extra devOCR pipeline: PyMuPDF extracts text from modern PDFs. For scanned/image-based PDFs, the optional [ocr] dependency group adds Surya OCR (state-of-the-art for academic PDFs — handles equations, multi-column layout, multi-language text) with PyTesseract as a legacy fallback.
Grobid: Machine-learning library for extracting structured metadata (title, authors, abstract, references, affiliations) from academic PDFs. Requires a Docker container (docker run -p 8070:8070 lfoppiano/grobid) and the grobid-client Python package.
Anystyle.io: Ruby gem for parsing raw reference strings into structured metadata. Used by the parse_references MCP tool. Install with gem install anystyle.
Hybrid search: Full-text BM25 (Tantivy) + vector similarity (LanceDB) for searching the local downloaded-paper library. sentence-transformers provides the embedding model for vector search.
OS keychain credential storage: Institutional credentials can be stored securely in the OS keychain (macOS Keychain, Windows Credential Manager, Linux Secret Service) via Python keyring. The connect-institution.py script writes credentials to the keychain; browser-based adapters read from it at runtime.
Other MCP Clients
Claude Desktop — add to claude_desktop_config.json:
{
"mcpServers": {
"academic-research": {
"command": "uv",
"args": ["run", "--directory", "/path/to/mcp-for-research", "-m", "paper_search_mcp.server"]
}
}
}Cursor — .cursor/mcp.json is included in the repo.
Testing
Framework: pytest with unittest.mock for HTTP mocking (matching the project's unittest style). Snapshot testing via syrupy for citation output and API response regression. Coverage via pytest-cov.
Quick start
# Install the project with dev dependencies
uv sync --extra dev
# Run all tests (skip slow end-to-end scripts)
uv run pytest tests/ -v --ignore=tests/e2e_test.py --ignore=tests/functional_test.py
# Run with coverage report
uv run pytest tests/ --ignore=tests/e2e_test.py --ignore=tests/functional_test.py \
--cov=paper_search_mcp --cov=institutional --cov-report=termTest suite (~4,500+ tests, all passing)
| Category | Files | Style | Description |
| |---|---|---|---|
| Citation & style | test_citation_formatter.py (147), test_specialized_formatter.py (296) | Pure unit (no API) | Canonical tests for all 19 specialized source types across 3 formats (first note, shortened note, bibliography) with mode-separation proof. Covers author edge cases (11+ authors, suffixes, particles), title hazards (embedded quotes, HTML entities, 200+ chars), date anomalies (BCE, forthcoming, n.d.), pagination (Roman numerals, non-contiguous), edition strings, and full Vatican 32-type coverage. |
| Scenario coverage | test_scenario_coverage.py, test_scenario_coverage_r2.py, test_scenario_coverage_r3.py, test_scenario_coverage_r4_*.py (4 files), test_scenario_coverage_r5_batch2_*.py (5 files) | Mocked (unit) | 316 multi-step workflow scenarios across 9 files: search→citation round-trips, dedup edge cases, error propagation, and 36 Catholic user workflows (priest, theologian, lay personas) exercising all 19 specialized source types, both citation styles, style knowledge base, citation graph traversal, download fallback chains, web research, publisher/state abbreviation, and institutional source graceful degradation. |
| Download edge cases | test_download_edge_cases.py | Mocked (unit) | Truncated PDFs, redirect loops, 0-byte responses, disk-full, filename collisions, Sci-Hub CAPTCHA spoofing, repository fallback serial ordering, credential file parsing. |
| Catholic reference tools | test_utility_tools.py (46) | Pure unit | Catholic timeline, Pope database, Latin morphology (Whitaker's Words), Vulgate lookup, and Prayers collection. |
| Catholic FTS5 / Web adapters | test_fts5_adapters.py (32), test_daily_readings.py (9) | Mocked HTTP / Local DB | Test coverage for Summa Theologica, Roman Catechism, Baltimore Catechism, GIRM, Daily Readings, Catholic Ontology, and Liturgical Calendar. |
| Platform unit tests | test_openalex.py, test_pmc.py, test_core.py, test_europepmc.py, test_chemrxiv.py, test_pubmed.py, test_oaipmh.py | Fully mocked HTTP | API response parsing, parameter encoding, error handling, PDF download/read flows |
| Server integration | test_server_unit.py (91), test_fallback.py (16), test_server.py | Mocked searchers + async | Tool wrappers, fallback chains, source orchestration, deduplication, style knowledge base queries |
| Pure logic | test_cli.py, test_config_env.py | Unit | CLI argument parsing, env var precedence, dedupe keys, filename sanitation |
| Institutional adapters | test_jstor.py, test_proquest.py, test_ebscohost.py, test_project_muse.py, test_web_of_science.py, test_inst_browser.py, test_shibboleth_auth.py | Fully mocked | HTML scraping parsers, cookie persistence, browser lifecycle, auth flows |
| Integration/live | test_arxiv.py, test_crossref.py, test_semantic.py, test_iacr.py, test_biorxiv.py, test_medrxiv.py, test_dblp.py, test_doaj.py, test_openaire.py, test_hal.py, test_zenodo.py, test_ssrn.py, test_base.py, test_citeseerx.py, test_google_scholar.py, test_sci_hub.py, test_unpaywall.py | Live API (skip on failure) | Smoke tests against real endpoints |
| End-to-end | e2e_test.py, functional_test.py | Standalone scripts | Full search → download → read chain across all 25+ platforms |
Architecture (tests/conftest.py)
Shared fixtures: mock_session (mock requests.Session), temp_output_dir (uses tmp_path), sample_papers (list of Paper dicts).
CI
GitHub Actions workflow at .github/workflows/test.yml runs on every push and PR against Python 3.10–3.13:
# Installs uv, syncs dev deps, runs pytest with coverage
uv run pytest --cov=paper_search_mcp --cov=institutional --cov-report=termWhat's covered vs. what isn't
Covered (~90%): All parsers, date formatters, DOI extractors, abstract reconstructors, filter functions, deduplication logic (including Unicode edge cases), API parameter encoders, PDF download pipelines (including truncation, redirect loops, disk-full, and non-PDF rejection), CLI argument handling, env var loading, citation formatting (all 19 source types with mode separation, author edge cases, suffix handling, name particles, edition strings, date anomalies, pagination variants), style reference query system (45+ topics, 15 difference pairs), citation graph traversal, source registry dynamics, error propagation chains, and every source adapter's core search/read/download path. Multi-step workflow scenarios tested across search→citation round-trips, download fallback chains, and deduplication metadata integrity.
Newly covered since last baseline:
L3 Playwright browser automation (JSTOR Shibboleth login, PerimeterX detection, cookie persistence) — tested via HAR replay in CI, live Playwright locally (
tests/browser/)External API error branches — VCR cassette fixtures (
tests/fixtures/vcr/) record/replay real HTTP responses for 404, 500, timeout, and fallback chain exhaustionProperty-based invariants — Hypothesis generative testing for citation formatter correctness across random inputs (
tests/property/)MCP security — BibTeX/LaTeX injection prevention, tool name collision detection (
tests/security/)Snapshot regression testing — Syrupy snapshot tests for citation output and API response format stability (
tests/snapshots/)
Not covered (~10%):
Live Playwright sessions in CI — browser tests replay via HAR; full live browser runs require local credentials
Multi-mirror Sci-Hub rotation — tested with one mirror; rotation logic needs multi-endpoint live test
Feature gaps documented in tests as
# DEFERRED PHASE 3or# NOT A GAP: multi-language citation conventions, cross-form consistency validation, citation graph→formatter pipeline, per-format error diagnosis, RIS export. Several former gaps (CSL-JSON, BibTeX, CLIcite, DOI metadata completion, title-case normalization, accessibility output) are now implemented.
See pyproject.toml for [tool.pytest.ini_options] and [tool.coverage.*] configuration.
Extending: Adding a New Source
Create a Python file implementing the
PaperSourceABC (search,download_pdf,read_paper).Place it in
institutional/(for proxied databases) orpaper_search_mcp/academic_platforms/(for API-based sources).Register it in
paper_search_mcp/server.py:Import and instantiate the searcher
Add it to
ALL_SOURCESAdd a dispatch case in
search_papersRegister
@mcp.tool()functions for search/download/read
Example adapters: institutional/jstor.py (proxy scraping), paper_search_mcp/academic_platforms/semantic.py (REST API).
Roadmap
Phase 1 — Complete ✓
22 free/open source adapters (paper-search-mcp base)
5 institutional database adapters (JSTOR, ProQuest, EBSCOhost, Project MUSE, WoS)
Firecrawl web research integration
opencode MCP configuration
Deduplication (DOI → title → paper_id)
OA-first fallback download chain
Phase 2 — In Progress
Citation graph traversal (multi-hop via Semantic Scholar)
CMOS 18 / SBL 2nd edition citation formatting (first note, shortened note, bibliography for all 19 specialized source types)
Specialized biblical/theological source formatting (19 source types, 23 Vatican doc types, 6 canon law variants)
Style knowledge base with queryable rules (45+ topics, 15 difference pairs)
Catholic liturgical and Aquinas/scholastic citation support
Known-edition defaults (ANF/NPNF → Hendrickson, PG/PL → Migne, SC → Cerf, CIC → Vaticana)
Latin/English title duality for classical and patristic works
Robustness hardening: incomplete-metadata annotations, empty-field guards, multi-note chain testing
300 tests, zero failures, all categories ≥ 92/100 confidence
Code quality: shared source_registry module, ruff linting (0 issues), mypy type checking (CI), credential parsing consolidation
CI pipeline with ruff + mypy + pytest across Python 3.10–3.13
Server refactoring: tool factory from source registry (60-line if/elif → 8-line loop, ~800 lines eliminated)
Table-driven
search_papersdispatch viaALL_SOURCE_INFOSregistryAuto-generated search/download/read tools for all 25+ sources (consistent error handling)
Missing standalone download/read MCP tools added (PMC, CORE, EuropePMC, OpenAlex — previously only search)
download_with_fallbackuses live searcher registry instead of hardcoded downloader dictJSTOR PDF download: browser fallback when curl_cffi hits PerimeterX (authenticated Playwright session)
Test expansion: 1,683+ tests (up from 1,207), 316 multi-step scenario workflows across 9 scenario coverage files (3 rounds + 9 R4/R5 batch files covering 36 Catholic user workflows)
Bug fixes: edition double-suffix, suffix inversion (Jr./III), single-chapter Bible verses, name particles (van Gogh), path traversal validation, _enable_source registry sync, format_citation_multi exception type
7 new specialized source adapters: CCEL Church Fathers (38-volume ANF/NPNF, local search index), Perseus (120 Greek/Latin classical works), Google Books (API-key gated), Open Library (30M+ free catalog), OSF Preprints (30+ preprint servers), IxTheo (3M+ theology records), PhilPapers (philosophy + PhilArchive full-text)
APA 7th edition citation formatting (reference list + in-text parenthetical/narrative for books, journals, chapters)
Turabian 9th edition citation formatting (CMOS 18 wrapper with student-paper overrides)
CSL-JSON output format (Zotero/Mendeley/Pandoc interoperable)
BibTeX/BibLaTeX output format (LaTeX-ready with proper escaping and page-range formatting)
Original language tools: Strong's dictionary lookup (202 Greek/Hebrew entries), word study search, SBL-style Greek/Hebrew transliteration
Style knowledge base expanded: APA 7th (9 topics), Turabian 9th (6 topics)
Church Fathers scripture cross-reference index (Bible verse → patristic commentary)
RIS export/import (single + batch + parse) with 30+ type code mapping
CSL engine integration (citeproc-py + local style cache, 3 bundled styles)
Batch bibliography assembly with auto-sort (author/year) and grouping
Journal abbreviation database (500+ entries from SBLHS §8.4) + lookup tools
In-text citation customization: compound citations with prefix/suffix, page locators (7 types), multi-cite, narrative-citation omit-author
Tiptap rich text editor (ProseMirror-based) with citation @mentions in web dashboard. ⚠️ Yjs collaboration hook (
useYjsCollaboration.js) exists but is NOT wired to the Tiptap editor — collaborative document sync is pending.Yjs CRDT-based collaborative editing via WebSocket — ⚠️
useYjsCollaboration.jshook written but not integrated with Tiptap. Presence + messaging (WebSocket) is implemented; CRDT-based real-time document sync is pending.Hybrid BM25 + vector search (Tantivy + LanceDB) for the local paper library
Surya OCR + PyTesseract fallback for scanned PDFs — ⚠️ Surya integration code exists but the API is broken against the installed package version. PyTesseract fallback works.
Grobid structured metadata extraction from PDFs — ⚠️ Integration code exists but is never called in any execution path;
grobid-clientdependency not listed inpyproject.toml.Anystyle.io reference string parser
OS keychain credential storage (macOS Keychain, Windows Credential Manager, Linux Secret Service)
Sefaria open Jewish texts corpus adapter (Tanakh, Talmud, Midrash, commentaries)
Elicit AI research assistant adapter (semantic search, literature reviews, systematic reviews)
Lens.org scholarly works + patents search adapter
Catholic reference tools (Latin, popes, prayers, timeline, Vulgate)
Additional Catholic source adapters (Summa, Roman Catechism, Baltimore Catechism, GIRM, Daily Readings, Ontology)
FTS5 indexer base class refactoring
Accordance Bible software integration
Anna's Archive book search integration (as optional MCP tool)
Research report generation templates
Saved search alerts and recurring queries
Web dashboard for credential management and report browsing — ⚠️ Partially done: Search, Library, Citations, Projects, Credentials, and Notes pages functional. PDF viewer (Phase 1) working. Backend deployed on Railway. Some backend routes (annotations, PDF proxy, Notes POST) still missing. See
docs/PROJECT_STATUS_AND_REMAINING_WORK.md.Hosted server deployed on Railway (JWT auth, encrypted credential vault, rate limiting, usage tracking, admin dashboard). ⚠️ Retained institutional logins: cookie persistence works for JSTOR; other adapters may need re-authentication after session expiry.
Phase 3 — Future
Multi-user research sessions
Direct Zotero/EndNote export
Pre-indexed vector embeddings for LanceDB hybrid search
Advanced RAG (retrieval-augmented generation) over the paper library
Credits
Built on paper-search-mcp by P.S. Zhang / openags (MIT License). The core free/open source adapters and two-layer architecture originate from that project.
Institutional database adapters, Firecrawl integration, opencode configuration, and project packaging by Jacob Gruber.
License
MIT — see LICENSE.
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/jacobhgruber-dev/academic-research-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server