research-hub
Searches arXiv papers as part of the discovery step, enabling retrieval of paper metadata and identifiers.
Provides web search capabilities via the Brave Search API as an alternative search backend for finding papers and sources.
Integrates with Google NotebookLM for bundling sources, uploading, generating briefs, and downloading research summaries.
Writes Markdown notes, tags, and Obsidian Bases dashboards to an Obsidian vault, enabling AI-driven organization and linking of research notes.
Integrates with Semantic Scholar for paper search and metadata enrichment, with configurable API rate limits.
Integrates with Zotero via API to store citations, metadata, and PDFs, and to tag and annotate papers within the Zotero library.
research-hub
Turn your research stack into an AI-operable workspace. Use Zotero, Obsidian, and NotebookLM together, or start with any two. research-hub gives your AI assistant a real CLI, MCP server, REST API, and dashboard for repeatable literature workflows.

Traditional Chinese: README.zh-TW.md | Watch the full-res mp4
๐ Part of the agentic AI learning roadmap โ a 7-stage curated path for building agentic AI, multilingual (zh-TW ยท zh-Hans ยท English). This workspace is referenced in ยง13 (research workflow skills).
๐งช Real-use signal: in daily use by 1 PhD researcher (Lehigh CEE) tracking 7+ research clusters across Zotero + Obsidian + NotebookLM. Shipping since Apr 2026, docs updated for v0.95.0.
Real Screenshots
These are generated by a real research-hub vault, not mockups.
Obsidian paper note: Markdown note with title, authors, DOI, Zotero key, tags, cluster, status, and verification metadata.
Obsidian Bases dashboard: generated .base file with sortable paper
metadata and reading status.
Obsidian graph view: managed topic folders and labels can be colored with
research-hub vault graph-colors --refresh.
Generated crystals are also plain Markdown notes under
hub/<cluster>/crystals/*.md, so they can be linked, searched, and read
by MCP tools at low token cost.
Why this exists
Most research tools are good at one part of the workflow:
Zotero stores citations, metadata, and PDFs.
Obsidian stores notes, links, and synthesis.
NotebookLM turns source bundles into AI-readable briefs.
The painful part is the handoff. research-hub connects those handoffs so an AI agent can search, ingest, tag, summarize, repair, brief, and inspect your workspace without turning your library into an opaque RAG box.
You do not need all three tools on day one.
Your current stack | What research-hub gives you first |
Zotero + Obsidian | Paper search, Zotero metadata, Markdown notes, tags, Obsidian Bases dashboards |
Obsidian + NotebookLM | Local PDF/DOCX/MD/TXT ingest, cluster dashboards, NotebookLM bundles and briefs |
Zotero + NotebookLM | Zotero-backed paper selection, namespaced tags, NotebookLM upload/generate/download |
Zotero + Obsidian + NotebookLM | Full loop: discover -> ingest -> organize -> brief -> answer -> maintain |
No accounts yet | Sample dashboard and local smoke tests before connecting anything |
What it does
research-hub is a local-first orchestration layer for research workflows:
CLI:
research-hub auto,import-folder,ask,doctor,tidy,clusters,zotero,notebooklm,crystal, and more.MCP server: lets Claude Desktop, Claude Code, Cursor, Continue.dev, Cline, Roo Code, OpenClaw, and other MCP hosts operate the same workflow.
REST API: exposes
/api/v1/*for browser-only or HTTP-capable assistants.Portable skill pack:
SKILL.mdworkflow instructions can be installed directly for Claude Code, Codex, Cursor, and Gemini, or copied manually into hosts that support skill/rules directories.Dashboard: gives humans a live view of clusters, papers, diagnostics, briefs, writing support, and management actions.
Vault format: writes normal Markdown, frontmatter,
.basedashboards, cache files, and logs that you can inspect directly.Authenticity gate (v0.95+): every discovered paper must resolve to a real identifier (DOI / arXiv / PMID), pass integrity and relevance checks, or it is quarantined with a recorded reason and never written to the vault. No fabricated references โ inspect rejects with
research-hub quarantine list.
The core loop:
topic or source folder
-> discover or import sources
-> verify authenticity (resolve + integrity + relevance) or quarantine
-> enrich metadata
-> write Zotero tags/notes when enabled
-> write Obsidian Markdown notes and cluster dashboards
-> bundle/upload/generate with NotebookLM when enabled
-> cache answers as crystals and structured memoryIs this for me? โ vs alternatives
research-hub does not replace Zotero, Obsidian, or NotebookLM. It connects them so an AI agent can operate the workflow.
What you can do | Zotero alone | NotebookLM alone | Generic RAG | Obsidian-Zotero plugin | research-hub |
Search arXiv + Semantic Scholar in one command | No | No | DIY | No | Yes |
Ingest into Zotero and Obsidian and NotebookLM | No | No | DIY | Partial | Yes |
AI brief from your collection | No | Manual | DIY | No | Yes |
Cached canonical answers | No | No | Re-fetches | No | Yes |
Structured memory layer | No | No | Usually chunks | No | Yes |
Direct AI-agent control via MCP | No | No | DIY | No | Yes |
Live dashboard with action buttons | No | No | No | No | Yes |
Per-cluster Obsidian Bases dashboard | No | No | No | No | Yes |
No OpenAI/Anthropic API key required | n/a | Yes | Usually no | n/a | Yes |
Local-first vault you own | Partial | No | Depends | Yes | Yes |
The practical fit: research-hub is most useful if you already use at least two of Zotero, Obsidian, and NotebookLM and want your AI assistant to run the repetitive steps.
Start Here
Pick the path with the fewest moving parts. You can add Zotero, NotebookLM, MCP, or AI-host skills later.
Goal | Accounts needed | Commands |
Preview the dashboard only | None |
|
Try a demo vault | None |
|
Work from local PDFs/DOCX/Markdown | Obsidian optional |
|
Zotero + Obsidian, no browser automation | Zotero |
|
Full Zotero + Obsidian + NotebookLM loop | Zotero + Google |
|
Autonomous agent bootstrap | Existing vault or target folder |
|
After setup, run:
research-hub doctor
research-hub serve --dashboardFor the first real ingestion, keep NotebookLM out of the path until Zotero and Obsidian are healthy:
research-hub auto "agent-based modeling" --max-papers 3 --no-nlmThen enable NotebookLM after the browser login works:
research-hub notebooklm login --auto-detect
research-hub notebooklm bundle --cluster <slug>
research-hub notebooklm upload --cluster <slug>
research-hub notebooklm generate --cluster <slug> --type brief
research-hub notebooklm download --cluster <slug>research-hub setup also prints these next steps when it finishes.
First-Run Checklist
Item | Needed when | How to handle it |
Python 3.10+ | Always | Use the same Python that runs |
Zotero API key + library ID | Zotero-backed paper ingestion | Set |
Obsidian vault | Markdown note workflow | Point |
NotebookLM browser login | NotebookLM upload/generate/download | Run |
LLM CLI for relevance judging |
| Install |
AI-host integration | Claude/Codex/Cursor/Gemini/OpenClaw/etc. | Use MCP/REST for tool-calling hosts; use |
Credential Reference
These variables are required only for Zotero-backed workflows. Local file import, sample dashboards, MCP server startup, and REST API inspection can run without them.
Name | Required | Purpose |
| yes | Zotero web API auth, required for paper ingestion |
| yes | Zotero library identifier |
| no | Uses an S2 API key and defaults to a conservative ~1 request/sec throttle |
| no | Optional S2 request-rate override; leave unset unless your key has a different quota |
| no | Web search backend (alternative to DDG) |
| no | Web search backend (alternative to DDG) |
Semantic Scholar searches are deliberately paced. Without
SEMANTIC_SCHOLAR_API_KEY, research-hub uses a slower anonymous delay
because public traffic shares capacity. With a key, the default is
approximately one request per second and 429 responses are retried with
Retry-After / exponential backoff. If Semantic Scholar grants your key
a different quota, set SEMANTIC_SCHOLAR_RPS instead of editing code.
Operator Modes
research-hub supports both human-first and agent-first setup.
For a human researcher, research-hub setup runs the onboarding wizard,
installs host-specific skills when it can detect the host, optionally
launches NotebookLM login, and offers a small sample run.
For an autonomous agent or Cowork-style host:
pip install research-hub-pipeline
python -m research_hub describe > capabilities.json
python -m research_hub setup --autonomous --vault ./vault --persona agent
# emits BootstrapReport JSON; exit code 0 if ready, 1 otherwiseThen drive operations via CLI --json mode or the bundled MCP server
(research-hub-mcp). All report-shaped commands accept --json;
capability introspection lives in research-hub describe.
NotebookLM boundary. NotebookLM upload still requires one-time human-driven browser-based Google OAuth. Headless agents can prepare bundles and read downloaded briefs, but they cannot complete Google's first sign-in or phone challenge by themselves.
Relevance judge boundary. auto_research_topic and research-hub
auto run a fail-closed relevance check by default. With no supported
LLM CLI and no --no-fit-check, auto stops before search and prints
the fix instead of silently producing an empty vault.
Persona | Best for | Install extra |
Researcher | STEM papers, DOI/arXiv, Zotero-first workflows |
|
Humanities | books, quotes, URL-only sources, Zotero + Obsidian |
|
Analyst | industry research, local PDFs/reports, no Zotero required |
|
Internal KM | lab/company knowledge bases, mixed file types |
|
Field presets for discover new, search, and related planning flows
are cs, bio, med, physics, math, social, econ, chem,
astro, edu, and general. There is no hydrology preset; use
general intentionally.
Connect your AI host
research-hub has two AI-facing integration layers:
Layer | Best for | Current status |
MCP / REST | Claude Desktop, Claude Code, Cursor, Continue.dev, Cline, Roo Code, VS Code Copilot, OpenClaw, and other tool-calling hosts | Host-agnostic; configure the MCP server or call the REST API |
Installed | Claude Code, Codex, Cursor, Gemini | Built-in installer targets via |
Manual | Hermes, OpenClaw, other agents with skill/rules directories | Copy or reference the bundled skill directories manually; not release-verified as installer targets |
For Claude Desktop, Cursor, Continue.dev, Cline, VS Code Copilot, OpenClaw, or another MCP host, configure the MCP server:
{ "mcpServers": { "research-hub": { "command": "research-hub", "args": ["serve"] } } }Restart the host. Then ask naturally:
Find me 5 papers on agent-based modeling and put them in a notebook.
The AI can call auto_research_topic(topic="agent-based modeling", max_papers=5) and ingest papers, generate a NotebookLM brief, and update the vault.
Install host-specific skill files for the platforms with known default skill directories:
research-hub install --platform claude-code
research-hub install --platform cursor
research-hub install --platform codex
research-hub install --platform geminiOpenClaw, Hermes, and other agents can still use research-hub through MCP/REST. If the host supports SKILL.md-style directories or rules files, copy the bundled directories from skills/ or inline the relevant SKILL.md into the host's instructions. research-hub install --platform does not currently verify those hosts.
Browser-only or HTTP-capable AIs can use the REST API after starting the local server with research-hub serve --dashboard:
curl -X POST http://127.0.0.1:8765/api/v1/plan \
-H "Content-Type: application/json" \
-d "{\"intent\":\"research harness engineering\"}"Full reference: MCP tools, AI integrations, AI host support matrix, and live smoke checklist.
Dashboard tour
research-hub serve --dashboard opens http://127.0.0.1:8765/.
Overview: treemap over clusters, storage map, and health summary.

Library: per-cluster drill-down with papers, sub-topics, and per-paper actions.

Diagnostics: grouped drift alerts and readiness checks.

Manage: CLI actions as buttons, inline result drawer, confirmation modal, and per-paper row actions.

Briefings and Writing tabs are also available. See the dashboard walkthrough and persona variants.
Inside Zotero
Every ingested paper gets a namespaced tag set so you can filter your library by research-hub context:
Tag | Meaning |
| Ingested through this pipeline |
| Which research cluster the paper belongs to |
| arXiv category like |
|
|
| Search backend that discovered it: |
Every paper can also get a child note with Summary / Key Findings / Methodology / Relevance, derived from the Obsidian frontmatter. Papers that were in Zotero before research-hub existed can be backfilled with:
research-hub zotero backfill --tags --notes --applyFeature matrix
Capability | Command or MCP tool | Notes |
One-shot setup |
| init + install + optional NotebookLM login + guided sample run |
Lazy research pipeline |
| Search, ingest, bundle, upload, generate, download |
Authenticity quarantine review |
| Inspect and optionally restore papers the authenticity gate rejected (with the failing layer + reason) |
Plan before running |
| Suggests field, cluster slug, and max papers |
Zotero hygiene |
| Fills missing tags and notes on legacy items |
Cluster cascade delete |
| Preview impact on Obsidian, Zotero, dedup, memory, and crystals |
No-NotebookLM smoke test |
| Validates search and vault ingest without browser automation |
Local file ingest |
| PDF, DOCX, MD, TXT, URL |
Ad-hoc cluster Q&A |
| Top-level CLI takes cluster first, then question |
NotebookLM operations |
| Browser automation with persistent Chrome |
Pre-computed crystals |
| Canonical answers cached as Markdown |
Structured memory |
| Entities, claims, methods |
Live dashboard |
| HTTP dashboard with action buttons |
Sample preview |
| Temporary bundled vault, no accounts |
Lazy maintenance |
| Doctor, dedup, bases refresh, cleanup preview |
Garbage collection |
| Bundles, debug logs, stale artifacts |
Cluster repair |
| Rebinds orphaned notes |
Obsidian Bases |
| Generated |
Web search |
| Tavily, Brave, Google CSE, DDG fallback |
Troubleshooting
Symptom | Cause | Fix |
| Chrome is missing or patchright cannot find it | Install Chrome, then run |
| New-device or bot challenge | Complete the visible browser sign-in and phone challenge |
| Topic too narrow OR papers were quarantined by the authenticity gate (unresolved DOI, failed integrity, or relevance-unjudged) | Re-run with |
| Fail-closed relevance check and no supported LLM CLI found | Install a judge CLI, or re-run with |
NotebookLM upload or generate fails | NotebookLM UI changed or login expired | Run |
| Google's | Re-run |
| No supported LLM CLI is on PATH | Install one, configure a custom adapter, or use |
Claude Desktop cannot see the MCP server | MCP config is in the wrong file or host was not restarted | Check the host config path and restart Claude Desktop |
| Persona expects Zotero | Re-run |
| Cluster has papers, notes, or Zotero items | Re-run with |
| Cluster is non-empty and you ran | Add |
Zotero items miss | Items were created before v0.61 or pipeline failed mid-run |
|
For broader checks, run:
research-hub doctor --autofixKnown limitations
These are platform or design boundaries, not bugs โ please do not file them as issues. They are documented here so you know what to expect and which workaround to reach for.
Limitation | What's actually happening | What to do |
IEEE Xplore PDFs / URLs are blocked by anti-bot | IEEE returns an "Unable to Load Page" HTML stub to direct fetches. | Configure |
NotebookLM session expires ~every 3.5h | Google's short-lived | Re-run |
The no-LLM BM25 gate is designed to catch blatant cross-field contamination (e.g. pure hydrology with zero AI in an LLM cluster). It cannot tell "AI-agents-in-general" from "AI-agents-in-water-resources" โ both score similarly on a lexical-only metric, so the gate is recall-biased and keeps both. | For topic-specific subset filtering, use the default LLM-judge path (drop | |
Cluster-overview LLM auto-fill writes English headings even when the scaffold is Chinese |
| Cosmetic โ content is correct. If you prefer Chinese headings on the filled overview, hand-curate the section names after the first auto-fill (the markers ensure subsequent runs preserve your edits). |
(CLI is opt-out) | Programmatic callers โ tests, library users โ get | If you call |
Slow / blocked publisher URLs sometimes poison the NotebookLM bundle | Some publishers (Wiley paywalls, Frontiers oddly-routed PDFs, IEEE) return either a thin stub or an HTML error page that the bundle ladder admits because the URL pre-check passed. Downstream NotebookLM grounds on the stub instead of the paper. | Run |
Docs + Status + Dev
Docs: First 10 minutes, lazy mode, dashboard walkthrough, MCP tools, AI host support matrix, live smoke checklist, personas, NotebookLM setup, EZproxy PDF access, import folder, CLI reference, CHANGELOG.
Status:
Current docs target: v0.95.0; see CHANGELOG for package history, docs/stable-api.md for the supported API surface, and docs/file-formats.md for parseable state-file schemas.
MCP tools: inspect the live list with
python -m research_hub describe --filter mcp_tools.REST endpoints: 12 at
/api/v1/*.Bundled skills: inspect the live list with
python -m research_hub describe --filter skills.
Developer setup:
git clone https://github.com/WenyuChiou/research-hub.git
cd research-hub
pip install -e ".[dev,playwright]"
python -m pytest -qContributing: CONTRIBUTING.md. Package on PyPI: research-hub-pipeline. CLI entry point: research-hub.
License
MIT. See LICENSE.
This server cannot be installed
Maintenance
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/WenyuChiou/research-hub'
If you have feedback or need assistance with the MCP directory API, please join our Discord server