pubmed-search-mcp
The PubMed Search MCP is an intelligent, agent-first biomedical literature research assistant that provides unified access to multiple academic databases with advanced query intelligence, citation analysis, and full research lifecycle support.
Search & Query Intelligence
Unified multi-source search across PubMed, Europe PMC, CORE, OpenAlex, Semantic Scholar, and CrossRef with deduplication and ranking
Automatic ICD-9/10 → MeSH conversion, MeSH term expansion, synonym discovery, and spelling correction (ESpell)
PICO clinical question parsing — validate P/I/C/O elements and generate structured search pipelines
Query analysis to understand how PubMed interprets a query before execution
Preprint search across arXiv, medRxiv, and bioRxiv
Multi-step DAG search pipelines (YAML/JSON) for reusable, reproducible workflows
Article Discovery & Citation Networks
Find related, citing, and referenced articles; build multi-level citation trees exportable to Cytoscape, D3, Mermaid, GraphML, and more
NIH iCite metrics — relative citation ratio (RCR), NIH percentile, citations per year, and clinical translation scores
Full Text & Figure Access
Multi-source full-text retrieval via Europe PMC XML, Unpaywall, CORE, institutional access, and EZproxy
Extract figures, captions, and image URLs from PMC Open Access articles
Text-mined entity extraction (genes, diseases, chemicals) from Europe PMC
Institutional Access
Configure and test institutional OpenURL/SFX/EZproxy resolvers with presets for major universities
Generate OpenURL access links and diagnose full-text access paths
Biomedical Databases
Search NCBI Gene, PubChem (compounds), and ClinVar (clinical variants) with linked literature retrieval
Research Timeline & Evolution
Build research timelines with automatic milestone detection (clinical trials, FDA approvals, landmark studies)
Visualize as text, Mermaid, mindmap, D3, or TimelineJS; compare timelines across topics
Image Search & Vision
Search biomedical images via Open-i (X-rays, microscopy, CT/MRI) and Europe PMC
Analyze uploaded figures to extract search terms and find related literature
Export & Notes
Export citations in RIS, BibTeX, CSL JSON, MEDLINE, and CSV (compatible with Zotero, EndNote, Mendeley, LaTeX)
Save literature notes as local wiki/Foam-compatible Markdown files with frontmatter and wikilinks
Session & Pipeline Management
Session caching and persistent artifacts to reduce redundant API calls
Save, load, schedule (cron-based), and track execution history of reusable pipelines
Session activity log for reviewing past searches
Supports preprint search across arXiv, medRxiv, and bioRxiv with peer-review filtering capabilities.
Provides tools for searching PubMed, retrieving citations, full text, and related metadata, with features like auto-correction (ESpell) and vocabulary translation (ICD to MeSH).
Integrates Semantic Scholar API for enhanced citation and full-text access, configurable via API key.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@pubmed-search-mcpsearch for recent papers on CRISPR gene therapy"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
PubMed Search MCP
Professional Literature Research Assistant for AI Agents - More than just an API wrapper
A Domain-Driven Design (DDD) based MCP server that serves as an intelligent research assistant for AI agents, providing task-oriented literature search and analysis capabilities.
✨ What's Included:
🔧 46 MCP Tools - Streamlined PubMed, Europe PMC, CORE, NCBI database access, and Research Timeline / Context Graph
🖼️ OA Figure Extraction - Pull figure captions, direct image URLs, and PDF links from PMC Open Access articles
📘 Docs Site - Browse language-switchable user and developer guides, architecture, quick reference, pipeline tutorials, source contracts, troubleshooting, and deployment in one place at u9401066.github.io/pubmed-search-mcp
📖 GitHub Wiki - GitHub-native mirror of the same canonical documentation at github.com/u9401066/pubmed-search-mcp/wiki
📚 24 Claude Skills - Ready-to-use workflow guides for AI agents (Claude Code-specific)
📖 Copilot Instructions - VS Code GitHub Copilot integration guide
🌐 Language: English | 繁體中文
📘 Documentation Map: README is the quick project entry point. Use the Docs Site for the best reading experience, the GitHub Wiki for GitHub-native navigation, and source docs for edits: User guide | Advanced workflows | Capability-first guide | Developer guide | Complete index
🚀 Quick Install
Prerequisites
Python 3.10+ — Download
uv (recommended) — Install uv
# macOS / Linux curl -LsSf https://astral.sh/uv/install.sh | sh # Windows powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"NCBI Email — Required by NCBI API policy. Any valid email address.
NCBI API Key (optional) — Get one here for higher rate limits (10 req/s vs 3 req/s)
OpenAlex API Key (optional) — set
OPENALEX_API_KEYto use authenticated OpenAlex requests instead of mailto-only polite-pool auth. Without source-specific emails, the server reuses the configured runtime contact email for OpenAlex, CrossRef, and Unpaywall.
Install & Run
# Option 1: Zero-install with uvx (recommended for trying out)
uvx pubmed-search-mcp
# Option 2: Add as project dependency
uv add pubmed-search-mcp
# Option 3: pip install
pip install pubmed-search-mcpPython SDK Facade
For in-process Python integrations, use the stable SDK facade instead of importing MCP tool modules:
from pubmed_search.api import PubMedSearchClient, PubMedSearchConfig
client = PubMedSearchClient(PubMedSearchConfig(email="your@email.com"))
result = await client.unified_search("remimazolam ICU sedation", limit=20)
print(result.articles)
print(result.source_counts)
print(result.artifact) # artifact locator when persistence is enabledUse uvx pubmed-search-mcp or /mcp for agent tool discovery. Use the SDK for
Python package/notebook calls where a typed object is easier than parsing an MCP
response string.
Related MCP server: ScholarMCP
⚙️ Configuration
This MCP server works with any MCP-compatible AI tool. Choose your preferred client:
VS Code / Cursor (.vscode/mcp.json)
{
"servers": {
"pubmed-search": {
"type": "stdio",
"command": "uvx",
"args": ["pubmed-search-mcp"],
"env": {
"NCBI_EMAIL": "your@email.com"
}
}
}
}Optional: enable browser-session PDF fallback once and let tools auto-use it:
{
"servers": {
"pubmed-search": {
"type": "stdio",
"command": "uvx",
"args": ["pubmed-search-mcp"],
"env": {
"NCBI_EMAIL": "your@email.com",
"BROWSER_FETCH_CONFIG": "{\"enabled\":true,\"auto_enabled\":true,\"broker_url\":\"http://127.0.0.1:8766/fetch\",\"token\":\"local-dev-token\",\"allowed_hosts\":[\"jamanetwork.com\",\"*.jamanetwork.com\",\"nejm.org\",\"*.nejm.org\"]}"
}
}
}
}With this setting, get_fulltext will automatically try the local broker for institutional or publisher landing pages. Pass allow_browser_session=false only when you want to suppress it for a specific call.
Run the local broker with download interception:
uv sync --extra browser-broker
uv run playwright install chromium
uv run pubmed-browser-fetch-broker --token local-dev-tokenThe broker launches a persistent browser profile with download interception enabled. Log in once inside that broker-controlled browser window, and subsequent PDF downloads will be captured automatically without a native "Save As" dialog.
Claude Desktop (claude_desktop_config.json)
{
"mcpServers": {
"pubmed-search": {
"command": "uvx",
"args": ["pubmed-search-mcp"],
"env": {
"NCBI_EMAIL": "your@email.com"
}
}
}
}Config file location:
macOS:
~/Library/Application Support/Claude/claude_desktop_config.jsonWindows:
%APPDATA%\Claude\claude_desktop_config.jsonLinux:
~/.config/Claude/claude_desktop_config.json
Claude Code
claude mcp add pubmed-search -- uvx pubmed-search-mcpOr add to .mcp.json in your project root:
{
"mcpServers": {
"pubmed-search": {
"command": "uvx",
"args": ["pubmed-search-mcp"],
"env": {
"NCBI_EMAIL": "your@email.com"
}
}
}
}Zed AI (settings.json)
Zed editor (z.ai) supports MCP servers natively. Add to your Zed settings.json:
{
"context_servers": {
"pubmed-search": {
"command": "uvx",
"args": ["pubmed-search-mcp"],
"env": {
"NCBI_EMAIL": "your@email.com"
}
}
}
}Tip: Open Command Palette →
zed: open settingsto edit, or go to Agent Panel → Settings → "Add Custom Server".
OpenClaw 🦞 (~/.openclaw/openclaw.json)
OpenClaw uses MCP servers via the mcp-adapter plugin. Install the adapter first:
openclaw plugins install mcp-adapterThen add to ~/.openclaw/openclaw.json:
{
"plugins": {
"entries": {
"mcp-adapter": {
"enabled": true,
"config": {
"servers": [
{
"name": "pubmed-search",
"transport": "stdio",
"command": "uvx",
"args": ["pubmed-search-mcp"],
"env": {
"NCBI_EMAIL": "your@email.com"
}
}
]
}
}
}
}
}Restart the gateway after configuration:
openclaw gateway restart
openclaw plugins list # Should show: mcp-adapter | loadedCline (cline_mcp_settings.json)
{
"mcpServers": {
"pubmed-search": {
"command": "uvx",
"args": ["pubmed-search-mcp"],
"env": {
"NCBI_EMAIL": "your@email.com",
"S2_API_KEY": "your_semantic_scholar_key",
"PUBMED_SEARCH_DISABLED_SOURCES": ""
},
"alwaysAllow": [],
"disabled": false
}
}
}Other MCP Clients
Any MCP-compatible client can use this server via stdio transport:
# Command
uvx pubmed-search-mcp
# With environment variable
NCBI_EMAIL=your@email.com uvx pubmed-search-mcpNote:
NCBI_EMAILis required by NCBI API policy. Optionally setNCBI_API_KEYfor higher rate limits (10 req/s vs 3 req/s). 📖 Detailed Integration Guides: See docs/INTEGRATIONS.md for all environment variables, Copilot Studio setup, Docker deployment, proxy configuration, and troubleshooting.
🎯 Design Philosophy
Core Positioning: The intelligent middleware between AI Agents and academic search engines.
Why This Server?
Other tools give you raw API access. We give you vocabulary translation + intelligent routing + research analysis:
Challenge | Our Solution |
Agent uses ICD codes, PubMed needs MeSH | ✅ Auto ICD→MeSH conversion |
Multiple databases, different APIs | ✅ Unified Search single entry point |
Clinical questions need structured search | ✅ PICO handoff + pipeline ( |
Typos in medical terms | ✅ ESpell auto-correction |
Too many results from one source | ✅ Parallel multi-source with dedup |
Need to trace research evolution | ✅ Research Timeline & Tree with landmark detection, diagnostics, and sub-topic branching |
Citation context is unclear | ✅ Citation Tree forward/backward/network |
Can't access full text | ✅ Multi-source fulltext (Europe PMC XML, Unpaywall OA locations, institutional direct/EZproxy, CORE, and downloader fallbacks) |
Gene/drug info scattered across DBs | ✅ NCBI Extended (Gene, PubChem, ClinVar) |
Need cutting-edge preprints | ✅ Preprint search (arXiv, medRxiv, bioRxiv) with peer-review filtering |
Export to reference managers | ✅ One-click export (official RIS/MEDLINE/CSL JSON; local RIS/BibTeX/CSV/MEDLINE/JSON) |
Key Differentiators
Vocabulary Translation Layer - Agent speaks naturally, we translate to each database's terminology (MeSH, ICD-10, text-mined entities)
Unified Search Gateway - One
unified_search()call, auto-dispatch to PubMed/Europe PMC/CORE/OpenAlexPICO Handoff + Pipeline - the Agent extracts P/I/C/O,
parse_pico()validates that structured handoff, and the backendtemplate: picopipeline executes O-aware precision/recall searchesResearch Timeline & Lineage Tree - Detect milestones with policy-driven heuristics, identify landmark papers via multi-signal scoring, surface timeline diagnostics, and visualize research evolution as branching trees by sub-topic
Citation Network Analysis - Build multi-level citation trees to map an entire research landscape from a single paper
Full Research Lifecycle - From search → discovery → full text → analysis → export, all in one server
Agent-First Design - Output optimized for machine decision-making, not human reading
📡 External APIs & Data Sources
This MCP server integrates with multiple academic databases and APIs:
Core Data Sources
Source | Coverage | Vocabulary | Auto-Convert | Description |
NCBI PubMed | 36M+ articles | MeSH | ✅ Native | Primary biomedical literature |
NCBI Entrez | Multi-DB | MeSH | ✅ Native | Gene, PubChem, ClinVar |
Europe PMC | 33M+ | Text-mined | ✅ Extraction | Full text XML access |
CORE | 200M+ | None | ➡️ Free-text | Open access aggregator |
Semantic Scholar | 200M+ | S2 Fields | ➡️ Free-text | AI-powered recommendations |
OpenAlex | 250M+ | Concepts | ➡️ Free-text | Open scholarly metadata |
NIH iCite | PubMed | N/A | N/A | Citation metrics (RCR) |
🔑 Key: ✅ = Full vocabulary support | ➡️ = Query pass-through (no controlled vocabulary)
ICD Codes: Auto-detected and converted to MeSH before PubMed search
Environment Variables
# Required
NCBI_EMAIL=your@email.com # Required by NCBI policy
# Optional - For higher rate limits
NCBI_API_KEY=your_ncbi_api_key # Get from: https://www.ncbi.nlm.nih.gov/account/settings/
CORE_API_KEY=your_core_api_key # Get from: https://core.ac.uk/services/api
CROSSREF_EMAIL=your@email.com # Optional override; defaults to server/NCBI email
UNPAYWALL_EMAIL=your@email.com # Optional override; defaults to server/NCBI email
S2_API_KEY=your_s2_api_key # Alias: SEMANTIC_SCHOLAR_API_KEY
PUBMED_SEARCH_DISABLED_SOURCES= # Example: semantic_scholar
# Optional - Network settings
HTTP_PROXY=http://proxy:8080 # HTTP proxy for API requests
HTTPS_PROXY=https://proxy:8080 # HTTPS proxy for API requests
# Optional - Institutional fulltext access
INSTITUTIONAL_DIRECT_FETCH=true # Try DOI publisher pages before CORE fallback
EZPROXY_ENABLED=false # Enable only after configuring EZPROXY_HOST + cookie
EZPROXY_HOST=ezproxy.example.edu
EZPROXY_COOKIE_FILE=/path/to/cookies.json
# Optional - Local note export
PUBMED_NOTES_DIR=/path/to/wiki/references # save_literature_notes target folder
PUBMED_WORKSPACE_DIR=/path/to/project # fallback: references/ under this workspace
PUBMED_DATA_DIR=~/.pubmed-search-mcp # fallback: references/ under this data dirCrossRef, Unpaywall, and OpenAlex reuse the runtime server contact email
(NCBI_EMAIL, CLI --email, or detected git email) unless a source-specific
email/API key is configured.
Local note export resolves directories in this order: output_dir argument, PUBMED_NOTES_DIR, PUBMED_WORKSPACE_DIR/references, PUBMED_DATA_DIR/references, then ~/.pubmed-search-mcp/references.
For LLM wiki compatibility, wiki and foam exports use stable link targets based on PMID, DOI, PMCID, or fallback identifiers; titles remain aliases/display labels, and the response includes wiki_validation for unresolved wikilink checks.
🔄 How It Works: The Middleware Architecture
┌─────────────────────────────────────────────────────────────────────────────┐
│ AI AGENT │
│ │
│ "Find papers about I10 hypertension treatment in diabetic patients" │
│ │
└─────────────────────────────────┬───────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ 🔄 PUBMED SEARCH MCP (MIDDLEWARE) │
│ ┌─────────────────────────────────────────────────────────────────────────┐│
│ │ 1️⃣ VOCABULARY TRANSLATION ││
│ │ • ICD-10 "I10" → MeSH "Hypertension" ││
│ │ • "diabetic" → MeSH "Diabetes Mellitus" ││
│ │ • ESpell: "hypertention" → "hypertension" ││
│ └─────────────────────────────────────────────────────────────────────────┘│
│ ┌─────────────────────────────────────────────────────────────────────────┐│
│ │ 2️⃣ INTELLIGENT ROUTING ││
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ││
│ │ │ PubMed │ │Europe PMC│ │ CORE │ │ OpenAlex │ ││
│ │ │ 36M+ │ │ 33M+ │ │ 200M+ │ │ 250M+ │ ││
│ │ │ (MeSH) │ │(fulltext)│ │ (OA) │ │(metadata)│ ││
│ │ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ ││
│ │ └──────────────┴──────────────┴──────────────┘ ││
│ │ ▼ ││
│ │ 3️⃣ RESULT AGGREGATION: Dedupe + Rank + Enrich ││
│ └─────────────────────────────────────────────────────────────────────────┘│
└─────────────────────────────────┬───────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ UNIFIED RESULTS │
│ • 150 unique papers (deduplicated from 4 sources) │
│ • Ranked by relevance + citation impact (RCR) │
│ • Full text links enriched from Europe PMC │
└─────────────────────────────────────────────────────────────────────────────┘🛠️ MCP Tools Overview
If you want to understand the tool surface as a usable system, do not start by memorizing 46 tool names.
Start with the Tools Usage Guide: it compresses the current 46 tools into 8 capability families, explains the theoretical lower bound, and gives intent-based routing for both humans and agents.
🔍 Search & Query Intelligence
┌─────────────────────────────────────────────────────────────────┐
│ SEARCH ENTRY POINT │
├─────────────────────────────────────────────────────────────────┤
│ │
│ unified_search() ← 🌟 Single entry for all sources │
│ │ │
│ ├── Quick search → Direct multi-source query │
│ ├── PICO hints → Detects comparison, shows P/I/C/O │
│ └── ICD expansion → Auto ICD→MeSH conversion │
│ │
│ Sources: PubMed · Europe PMC · CORE · OpenAlex │
│ Auto: Deduplicate → Rank → Enrich full-text links │
│ │
├─────────────────────────────────────────────────────────────────┤
│ QUERY INTELLIGENCE │
│ │
│ generate_search_queries() → MeSH expansion + synonym discovery │
│ parse_pico() → Agent-provided PICO handoff │
│ analyze_search_query() → Query analysis without execution │
│ │
└─────────────────────────────────────────────────────────────────┘🔬 Discovery Tools (After Finding Key Papers)
Found important paper (PMID)
│
┌───────────────────────┼───────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ BACKWARD │ │ SIMILAR │ │ FORWARD │
│ ◀────── │ │ ≈≈≈≈≈≈ │ │ ──────▶ │
│ │ │ │ │ │
│ get_article │ │find_related │ │find_citing │
│ _references │ │ _articles │ │ _articles │
│ │ │ │ │ │
│ Foundation │ │ Similar │ │ Follow-up │
│ papers │ │ topic │ │ research │
└─────────────┘ └─────────────┘ └─────────────┘
fetch_article_details() → Detailed article metadata
get_citation_metrics() → iCite RCR, citation percentile
build_citation_tree() → Full network visualization (6 formats)
📚 Full Text, Figure Extraction & Export
Category | Tools |
Full Text |
|
Figures |
|
Figure-aware Full Text |
|
Text Mining |
|
Export |
|
🖼️ OA Figure-First Exploration
Use the PMC Open Access path when an agent needs evidence figures, not just article text:
get_article_figures(identifier="PMC12086443")→ Figure labels, captions, image URLs, and PDF/article linksget_fulltext(pmcid="PMC7096777", include_figures=True)→ Structured fulltext with figures inlineFigure output preserves article context, so agents can connect each figure back to the sections where it is mentioned
🧬 NCBI Extended Databases
Tool | Description |
| Search NCBI Gene database |
| Gene details by NCBI Gene ID |
| PubMed articles linked to a gene |
| Search PubChem compounds |
| Compound details by PubChem CID |
| PubMed articles linked to a compound |
| Search ClinVar clinical variants |
🕰️ Research Timeline & Lineage Tree
Tool | Description |
| Build timeline/tree with landmark detection and formatted diagnostics. Output: text, tree, mermaid, mindmap, json, json_tree, timeline_js, d3 |
| Analyze milestone distribution with diagnostics payload |
| Compare multiple topic timelines with per-topic diagnostics |
Current timeline and tree outputs are projections, not a persisted chronicle asset. The planned persistent/versioned Research Chronicle is specified in docs/RESEARCH_CHRONICLE_REFACTOR_SPEC.md.
🏥 Institutional Access & ICD Conversion
Tool | Description |
| Configure institution's link resolver |
| Generate OpenURL access link |
| List resolver presets |
| Test resolver configuration |
| Diagnose direct DOI, EZproxy, and OpenURL handoff paths |
| Convert between ICD codes and MeSH terms (bidirectional) |
| Auto-detect ICD codes in queries and expand them to MeSH |
💾 Session Management
Tool | Description |
| Retrieve cached PMID lists |
| Get article from session cache (no API cost) |
| Session status overview |
| Facade for PMIDs, cached articles, history, and persistent artifacts |
Dynamic MCP resources are also available for agents that can read resources directly:
session://context— active session statussession://last-search— latest search metadatasession://last-search/pmids— latest PMID list + CSV formsession://last-search/results— cached article payloads for the latest search
Persistent Artifacts
Persistent MCP output artifacts are saved for reusable unified_search and
get_fulltext responses when session persistence is configured. Tool responses
act like index cards: they include enough counts, source warnings, and artifact
hints for an agent to answer immediately, while the full evidence payload stays
in files that can be read repeatedly. The compact artifact locator includes
artifact_id, artifact_uri, primary_file, summary, file inventory,
read_order, audit status, and exact read_session(...) retrieval hints. Set
PUBMED_ARTIFACT_INCLUDE_LOCAL_PATHS=true only when a local MCP client should
also receive local_path and manifest_path directly.
Remote clients that cannot read the server filesystem can retrieve the same content through the session facade:
read_session(action="list_artifacts")
read_session(action="artifact", artifact_id="...")
read_session(action="artifact", artifact_uri="artifact://...")
read_session(action="artifact", artifact_uri="artifact://...", artifact_file="audit.json")
read_session(action="artifact", artifact_uri="artifact://...", artifact_file="query_strategy.json")
read_session(action="artifact", artifact_uri="artifact://...", artifact_file="results.json", offset=0, max_chars=200000)
read_session(action="list_artifacts", include_local_paths=true)unified_search artifacts use a research envelope. Start with audit.json for
source-count and completeness warnings, then query_strategy.json for the exact
executed plan, and finally results.json / results.toon for the complete
article list. This keeps MCP response tokens small without losing academic
traceability.
Artifacts are generated from the already-computed result object, so reading an
artifact does not rerun searches or fulltext retrieval.
read_session redacts local filesystem paths by default; local_path and
manifest_path are server-local paths, not portable client paths. Artifacts
from get_fulltext may contain article body text, including subscription or
institutionally accessed content. Store and share them according to publisher,
license, and institutional access terms.
Large get_fulltext responses are returned inline as a preview when an artifact
is available; use the artifact locator to retrieve the saved full content.
When one source fails but the overall search can continue, JSON responses may
include source_errors; markdown responses show a Source warnings line. For
Semantic Scholar HTTP 429s, set S2_API_KEY / SEMANTIC_SCHOLAR_API_KEY, retry
later, or temporarily exclude it with sources="auto,-semantic_scholar" or
PUBMED_SEARCH_DISABLED_SOURCES=semantic_scholar.
Pipeline Management
manage_pipeline is the primary facade for pipeline CRUD, history, and scheduling. The more specific pipeline tools remain available as compatibility wrappers.
Tool | Description |
| Primary facade for save, list, load, delete, history, and schedule actions |
| Save a pipeline config for later reuse (YAML/JSON, auto-validated) |
| List saved pipelines (filter by tag/scope) |
| Load pipeline from name or file for review/editing |
| Delete pipeline and its execution history |
| View execution history with article diff analysis |
| Create, update, or remove recurring pipeline schedules |
Step-by-step tutorials:
👁️ Vision & Image Search
Tool | Description |
| Handoff an uploaded image, image URL, or data URI to agent vision for search-term extraction |
| Search biomedical images across Open-i (X-ray, microscopy, photos, diagrams) |
Use analyze_figure_for_search when the user supplies an image and the agent
must interpret its meaning first. The tool returns MCP ImageContent plus
instructions for the LLM agent to extract English biomedical terms, then
continue with search_biomedical_images for similar Open-i images or
unified_search for related papers.
📄 Preprint Search
Search arXiv, medRxiv, and bioRxiv preprint servers via unified_search options flags:
preprints: Search preprint servers and merge preprints into the main aggregated result set witharticle_type=PREPRINT.all_types: Keep non-peer-reviewed content already returned by selected scholarly sources even without a preprint-server crawl.
Recommended combinations:
Empty
options: Peer-reviewed results only; preprint-like records are filtered.options="preprints": Searches arXiv, medRxiv, and bioRxiv, then ranks/dedupes those preprints with the main results.options="preprints, all_types": Same preprint-server crawl, plus other non-peer-reviewed records from selected sources are retained.options="all_types": No preprint-server crawl, but non-peer-reviewed items from searched sources are retained.
Preprint detection — articles are identified as preprints by:
Article type from source API (OpenAlex, CrossRef, Semantic Scholar)
arXiv ID present without PubMed ID
Known preprint server source or journal name
DOI prefix matching preprint servers (e.g.,
10.1101/→ bioRxiv/medRxiv,10.48550/→ arXiv)
🌳 Research Context Graph
unified_search can append a lightweight research lineage view built from PMID-backed ranked results:
Option Flag | Description |
| Append a lightweight Research Context Graph preview from the current PMID-backed ranked set to Markdown output and include |
This is useful when an agent needs quick thematic branching without making a second build_research_timeline call.
📊 Count-First Orientation
unified_search can also front-load the existing source coverage and decision hints for agents that want routing help before reading the ranked list:
Option Flag | Description |
| Add a source-count table, coverage summary, and next-tool recommendations to the response |
Example:
unified_search(query="remimazolam ICU sedation", options="counts_first")This mode is useful when the agent should decide whether to expand a source, inspect the lead PMID, fetch fulltext, extract figures, or pivot into timeline exploration.
⏱️ MCP Progress Reporting
When the MCP client provides a progress token, unified_search, build_research_timeline, analyze_timeline_milestones, compare_timelines, get_fulltext, and get_text_mined_terms emit progress updates for their major phases.
This reduces the "black box" wait time for agents during longer searches.
📋 Agent Usage Examples
1️⃣ Quick Search (Simplest)
# Agent just asks naturally - middleware handles everything
unified_search(query="remimazolam ICU sedation", limit=20)
# Or with clinical codes - auto-converted to MeSH
unified_search(query="I10 treatment in E11.9 patients")
# ↑ ICD-10 ↑ ICD-10
# Hypertension Type 2 Diabetes2️⃣ PICO Clinical Question
Simple path — unified_search can search directly (no PICO decomposition):
# unified_search searches as-is; detects "A vs B" pattern and shows PICO hints in metadata
unified_search(query="Is remimazolam better than propofol for ICU sedation?")
# → Multi-source keyword search + PICO hint metadata in output
# ⚠️ This does NOT auto-decompose PICO or expand MeSH!
# For structured PICO search, use the Agent workflow belowAgent workflow — agent-provided PICO + backend pipeline search (recommended for clinical questions):
┌─────────────────────────────────────────────────────────────────────────┐
│ "Is remimazolam better than propofol for ICU sedation?" │
└─────────────────────────────────┬───────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ parse_pico() │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ P │ │ I │ │ C │ │ O │ │
│ │ ICU │ │remimaz- │ │propofol │ │sedation │ │
│ │patients │ │ olam │ │ │ │outcomes │ │
│ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │
└───────┼────────────┼────────────┼────────────┼──────────────────────────┘
│ │ │ │
▼ ▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────────┐
│ generate_search_queries() × 4 (parallel) │
│ │
│ P → "Intensive Care Units"[MeSH] │
│ I → "remimazolam" [Supplementary Concept], "CNS 7056" │
│ C → "Propofol"[MeSH], "Diprivan" │
│ O → "Conscious Sedation"[MeSH], "Deep Sedation"[MeSH] │
└─────────────────────────────────┬───────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ Agent combines with Boolean logic │
│ │
│ (P) AND (I) AND (C) AND (O) ← High precision │
│ (P) AND (I OR C) AND (O) ← High recall │
└─────────────────────────────────┬───────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ unified_search() (auto multi-source + dedup) │
│ │
│ PubMed + Europe PMC + CORE + OpenAlex → Auto deduplicate & rank │
└─────────────────────────────────────────────────────────────────────────┘# Step 1: Agent extracts P/I/C/O, then validates the structured handoff
pico = parse_pico(
description="Is remimazolam better than propofol for ICU sedation?",
p="ICU patients requiring sedation",
i="remimazolam",
c="propofol",
o="sedation efficacy, delirium, hypotension"
)
# Returns validation plus a ready-to-run `template: pico` pipeline.
# Step 2: Get MeSH for each element (parallel!)
generate_search_queries(topic="ICU patients") # P
generate_search_queries(topic="remimazolam") # I
generate_search_queries(topic="propofol") # C
generate_search_queries(topic="sedation") # O
# Step 3: Either pass expanded fragments back as p_query/i_query/c_query/o_query
# or let the backend pipeline use the structured P/I/C/O labels.
# Step 4: Search (backend runs O-aware precision/recall searches, dedup, rank)
unified_search(
query="Is remimazolam better than propofol for ICU sedation?",
pipeline=pico["pipeline"]
)3️⃣ Explore from Key Paper
# Found landmark paper PMID: 33475315
find_related_articles(pmid="33475315") # Similar methodology
find_citing_articles(pmid="33475315") # Who built on this?
get_article_references(pmid="33475315") # What's the foundation?
# Build complete research map
build_citation_tree(pmid="33475315", depth=2, output_format="mermaid")4️⃣ Gene/Drug Research
# Research a gene
search_gene(query="BRCA1", organism="human")
get_gene_literature(gene_id="672", limit=20)
# Research a drug compound
search_compound(query="propofol")
get_compound_literature(cid="4943", limit=20)5️⃣ Export Results
# Export last search results
prepare_export(pmids="last", format="ris") # → EndNote/Zotero
prepare_export(pmids="last", format="bibtex", source="local") # → LaTeX
prepare_export(pmids="last", format="csl") # → CSL JSON from the official NCBI Citation API
save_literature_notes(pmids="last") # → local wiki note + Foam-compatible wikilinks + CSL JSON
save_literature_notes(pmids="last", note_format="medpaper", output_dir="./references")
save_literature_notes(pmids="last", template_file="./reference-template.md")
# Retrieve full text for a selected paper from the last search
get_fulltext(pmid="12345678", extended_sources=True)6️⃣ Preprint Search
# Include preprints alongside peer-reviewed results
unified_search(query="COVID-19 vaccine efficacy", options="preprints")
# → Main aggregated results include labelled arXiv, medRxiv, and bioRxiv preprints
# Include preprints and retain non-peer-reviewed items in main results
unified_search(query="CRISPR gene therapy", options="preprints, all_types")
# → Preprint-server crawl + non-peer-reviewed items retained in main results
# Only peer-reviewed (default behavior)
unified_search("diabetes treatment")
# → Preprints from any source automatically filtered out
# Add a research context graph preview to the same search response
unified_search("remimazolam ICU sedation", options="context_graph")7️⃣ Pipeline (Reusable Search Plans)
# Save a template-based pipeline through the primary facade
manage_pipeline(
action="save",
name="icu_sedation_weekly",
config="template: pico\nparams:\n P: ICU patients\n I: remimazolam\n C: propofol\n O: delirium",
tags="anesthesia,sedation",
description="Weekly ICU sedation monitoring"
)
# Save a custom DAG pipeline
manage_pipeline(
action="save",
name="brca1_comprehensive",
config="""
steps:
- id: expand
action: expand
params: { topic: BRCA1 breast cancer }
- id: pubmed
action: search
params: { query: BRCA1, sources: pubmed, limit: 50 }
- id: expanded
action: search
inputs: [expand]
params: { strategy: mesh, sources: pubmed,openalex, limit: 50 }
- id: merged
action: merge
inputs: [pubmed, expanded]
params: { method: rrf }
- id: enriched
action: metrics
inputs: [merged]
output:
limit: 30
ranking: quality
"""
)
# Execute a saved pipeline
unified_search(pipeline="saved:icu_sedation_weekly")
# List & manage
manage_pipeline(action="list", tag="anesthesia")
manage_pipeline(action="load", source="brca1_comprehensive") # Review YAML
manage_pipeline(action="history", name="icu_sedation_weekly") # View past runs🔍 Search Mode Comparison
┌─────────────────────────────────────────────────────────────────────────┐
│ SEARCH MODE DECISION TREE │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ "What kind of search do I need?" │
│ │ │
│ ├── Know exactly what to search? │
│ │ └── unified_search(query="topic keywords") │
│ │ → Quick, auto-routing to best sources │
│ │ │
│ ├── Have a clinical question (A vs B)? │
│ │ └── Agent P/I/C/O → parse_pico() handoff │
│ │ → unified_search(template:pico) or expanded Boolean │
│ │ │
│ ├── Need comprehensive systematic coverage? │
│ │ └── generate_search_queries() → parallel search │
│ │ → MeSH expansion, multiple strategies, merge │
│ │ │
│ └── Exploring from a key paper? │
│ └── find_related/citing/references → build_citation_tree │
│ → Citation network, research context │
│ │
└─────────────────────────────────────────────────────────────────────────┘Mode | Entry Point | Best For | Auto-Features |
Quick |
| Fast topic search | ICD→MeSH, multi-source, dedup |
PICO | Agent P/I/C/O -> | Clinical questions | Validate handoff -> |
Systematic |
| Literature reviews | MeSH expansion, synonyms |
Exploration |
| From key paper | Citation network, related |
🤖 Claude Skills (AI Agent Workflows)
Pre-built workflow guides in .claude/skills/, divided into Usage Skills (for using the MCP server) and Development Skills (for maintaining the project):
📚 Usage Skills (10) — For AI Agents Using This MCP Server
Skill | Description |
| Basic search with filters |
| MeSH expansion, comprehensive |
| Clinical question decomposition |
| Citation tree, related articles |
| Gene/PubChem/ClinVar |
| Europe PMC, CORE full text |
| RIS/BibTeX/CSV/CSL export guidance |
| Cross-database unified search |
| Complete tool reference guide |
| Save, load, reuse search plans |
🔧 Development Skills (13) — For Project Contributors
Skill | Description |
| Auto-update CHANGELOG.md |
| DDD architecture refactoring |
| Code quality & security review |
| DDD scaffold for new features |
| Sync docs before commits |
| Pre-commit workflow orchestration |
| Save context to Memory Bank |
| Update Memory Bank files |
| Initialize new projects |
| Multilingual README sync |
| Sync README with code changes |
| Update ROADMAP.md status |
| Generate test suites |
📁 Location:
.claude/skills/*/SKILL.md(Claude Code-specific, and the single source of truth for repo skills) Do not mirror or split repo skills into.github/skills/. These repo skills are project-scoped and should remain version-controlled. Personal cross-project skills belong in a user directory such as~/.copilot/skills/or~/.claude/skills/, not in this repository.
🏗️ Architecture (DDD)
This project uses Domain-Driven Design (DDD) architecture, with literature research domain knowledge as the core model.
src/pubmed_search/
├── domain/ # Core business logic
│ └── entities/article.py # UnifiedArticle, Author, etc.
├── application/ # Use cases
│ ├── search/ # QueryAnalyzer, ResultAggregator
│ ├── export/ # Citation export (RIS, BibTeX...)
│ └── session/ # SessionManager
├── infrastructure/ # External systems
│ ├── ncbi/ # Entrez, iCite, Citation Exporter
│ ├── sources/ # Europe PMC, CORE, CrossRef...
│ └── http/ # HTTP clients
├── presentation/ # User interfaces
│ ├── mcp_server/ # MCP tools, prompts, resources
│ │ └── tools/ # discovery, strategy, pico, export...
│ └── api/ # Auxiliary HTTP API routes (not pubmed_search.api)
└── shared/ # Cross-cutting concerns
├── exceptions.py # Unified error handling
└── async_utils.py # Rate limiter, retry, circuit breakerInternal Mechanisms (Transparent to Agent)
Mechanism | Description |
Session | Auto-create, auto-switch |
Cache | Auto-cache search results, avoid duplicate API calls |
Rate Limit | Auto-comply with NCBI API limits (0.34s/0.1s) |
MeSH Lookup |
|
ESpell | Auto spelling correction ( |
Query Analysis | Each suggested query shows how PubMed actually interprets it |
Vocabulary Translation Layer (Key Feature)
Our Core Value: We are the intelligent middleware between Agent and Search Engines, automatically handling vocabulary standardization so Agent doesn't need to know each database's terminology.
Different data sources use different controlled vocabulary systems. This server provides automatic conversion:
API / Database | Vocabulary System | Auto-Conversion |
PubMed / NCBI | MeSH (Medical Subject Headings) | ✅ Full support via |
ICD Codes | ICD-10-CM / ICD-9-CM | ✅ Auto-detect & convert to MeSH |
Europe PMC | Text-mined entities (Gene, Disease, Chemical) | ✅ |
OpenAlex | OpenAlex Concepts (deprecated) | ❌ Free-text only |
Semantic Scholar | S2 Field of Study | ❌ Free-text only |
CORE | None | ❌ Free-text only |
CrossRef | None | ❌ Free-text only |
Automatic ICD → MeSH Conversion
When searching with ICD codes (e.g., I10 for Hypertension), unified_search() automatically:
Detects ICD-10/ICD-9 patterns via
detect_and_expand_icd_codes()Looks up corresponding MeSH terms from internal mapping (
ICD10_TO_MESH,ICD9_TO_MESH)Expands query with MeSH synonyms for comprehensive search
# Agent calls unified_search with clinical terminology
unified_search(query="I10 treatment outcomes")
# Server auto-expands to PubMed-compatible query
"(I10 OR Hypertension[MeSH]) treatment outcomes"📖 Full architecture documentation: ARCHITECTURE.md
MeSH Auto-Expansion + Query Analysis
When calling generate_search_queries("remimazolam sedation"), internally it:
ESpell Correction - Fix spelling errors
MeSH Query -
Entrez.esearch(db="mesh")to get standard vocabularySynonym Extraction - Get synonyms from MeSH Entry Terms
Query Analysis - Analyze how PubMed interprets each query
{
"mesh_terms": [
{
"input": "remimazolam",
"preferred": "remimazolam [Supplementary Concept]",
"synonyms": ["CNS 7056", "ONO 2745"]
}
],
"all_synonyms": ["CNS 7056", "ONO 2745", ...],
"suggested_queries": [
{
"id": "q1_title",
"query": "(remimazolam sedation)[Title]",
"purpose": "Exact title match - highest precision",
"estimated_count": 8,
"pubmed_translation": "\"remimazolam sedation\"[Title]"
},
{
"id": "q3_and",
"query": "(remimazolam AND sedation)",
"purpose": "All keywords required",
"estimated_count": 561,
"pubmed_translation": "(\"remimazolam\"[Supplementary Concept] OR \"remimazolam\"[All Fields]) AND (\"sedate\"[All Fields] OR ...)"
}
]
}Value of Query Analysis: Agent thinks
remimazolam AND sedationonly searches these two words, but PubMed actually expands to Supplementary Concept + synonyms, results go from 8 to 561. This helps Agent understand the difference between intent and actual search.
🔒 HTTPS Deployment
Enable HTTPS secure communication for production environments.
Copilot Studio Quick Start
# Step 1: Generate SSL certificates
./scripts/generate-ssl-certs.sh
# Step 2: Start HTTPS service (Docker)
./scripts/start-https-docker.sh up
# Verify deployment
curl -k https://localhost/HTTPS Endpoints
Service | URL | Description |
MCP |
| Streamable HTTP MCP endpoint |
Health |
| Health check |
Info |
| Runtime transport and endpoint metadata |
Exports |
| Prepared export file listing |
Remote MCP Client Configuration
{
"mcpServers": {
"pubmed-search": {
"url": "https://localhost/mcp"
}
}
}🏢 Microsoft Copilot Studio Integration
Integrate PubMed Search MCP with Microsoft 365 Copilot (Word, Teams, Outlook)!
Quick Start
# Start with Streamable HTTP transport (required by Copilot Studio)
pubmed-search-mcp-http --transport streamable-http --port 8765
# Enable Copilot-compatible HTTP semantics while keeping full tool schemas
pubmed-search-mcp-http --transport streamable-http --copilot-compatible --port 8765
# Or use the dedicated script with ngrok
./scripts/start-copilot-studio.sh --with-ngrokCopilot Studio Configuration
Field | Value |
Server name |
|
Server URL |
|
Authentication |
|
📖 Full documentation: copilot-studio/README.md
Use
pubmed-search-mcp-http --copilot-compatiblefor packaged Copilot HTTP semantics.run_server.pyremains a source-tree development wrapper; userun_copilot.pyonly when you need simplified tool schemas.⚠️ Note: SSE transport deprecated since Aug 2025. Use
streamable-http.
📖 More documentation:
Architecture → ARCHITECTURE.md
Pipeline tutorial (English) → docs/PIPELINE_MODE_TUTORIAL.en.md
Pipeline tutorial (zh-TW) → docs/PIPELINE_MODE_TUTORIAL.md
Deployment guide → DEPLOYMENT.md
Copilot Studio → copilot-studio/README.md
🔐 Security
Security Features
Layer | Feature | Description |
HTTPS | TLS 1.2/1.3 encryption | All traffic encrypted via Nginx |
Rate Limiting | 30 req/s | Nginx level protection |
Security Headers | XSS/CSRF protection | X-Frame-Options, X-Content-Type-Options |
Streamable HTTP |
| Modern MCP transport for remote clients |
No Database | Stateless | No SQL injection risk |
No Secrets | In-memory only | No credentials stored |
See DEPLOYMENT.md for detailed deployment instructions.
📤 Export Formats
Export your search results in formats compatible with major reference managers:
Format | Source | Compatible With | Use Case |
RIS | official or local | EndNote, Zotero, Mendeley | Universal import |
MEDLINE | official or local | PubMed tools | Native PubMed-style archiving |
CSL JSON | official | Citation processors | Programmatic citation styling |
BibTeX | local | LaTeX, Overleaf, JabRef | Academic writing |
CSV | local | Excel, Google Sheets | Data analysis |
JSON | local | Programmatic access | Custom processing |
Exported Fields
Core: PMID, Title, Authors, Journal, Year, Volume, Issue, Pages
Identifiers: DOI, PMC ID, ISSN
Content: Abstract (HTML tags cleaned)
Metadata: Language, Publication Type, Keywords
Access: DOI URL, PMC URL, Full-text availability
Special Character Handling
BibTeX exports use pylatexenc for proper LaTeX encoding
Nordic characters (ø, æ, å), umlauts (ü, ö, ä), and accents are correctly converted
Example:
Søren Hansen→S{\o}ren Hansen
📚 Citation
GitHub will show Cite this repository from CITATION.cff. If you use PubMed Search MCP in research, methods sections, or internal technical reports, prefer the GitHub-generated citation or reuse the repository metadata directly.
@software{pubmed_search_mcp,
title = {PubMed Search MCP},
author = {u9401066},
url = {https://github.com/u9401066/pubmed-search-mcp}
}📄 License
Apache License 2.0 - see LICENSE
🔗 Links
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
- Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)By Om-Shree-0709 on .Agentic AiPrompt InjectionWebAssembly
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/u9401066/pubmed-search-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server