Skip to main content
Glama

Scholar Agent

Python 3.10+ License: MIT MCP Ready

中文

General-purpose LLMs are often inaccurate and outdated in specialized domains. Scholar Agent combines online research + local knowledge accumulation into a sustainable knowledge flywheel, making your AI smarter in your domain over time. It also builds a human-readable knowledge base for quick learning. Integrates seamlessly with Claude Code and VS Code Copilot via MCP.

What It Does

Your question
    │
    ▼
Online research (LLM web search + academic APIs)
    │
    ▼
Structured synthesis (with citations, confidence, uncertainty)
    │
    ▼
Local accumulation (Markdown knowledge cards + BM25 index)
    │
    ▼
Next question: AI checks local first ── hit? ──► use directly, fast & accurate
    │ miss
    ▼
Research again → accumulate → reindex ──► knowledge base keeps growing

Each round compounds. Knowledge cards have full lifecycle management: draft → reviewed → trusted → stale → deprecated.

Academic Research Pipeline

Scholar Agent includes a comprehensive academic paper research pipeline:

  • Paper Search — Search papers from arXiv, DBLP, and Semantic Scholar. Filter by top conferences (CVPR, ICCV, ECCV, ICLR, AAAI, NeurIPS, ICML, ACL, EMNLP, MICCAI)

  • Smart Scoring — Four-dimensional scoring engine (relevance, recency, popularity, quality) ranks papers by your research interests

  • Deep Analysis Notes — Auto-generate 20+ section Obsidian-style markdown notes with <!-- LLM: --> placeholders for AI-assisted completion

  • Figure Extraction — Extract images from arXiv source archives and PDFs (via PyMuPDF)

  • Daily Recommendations — Automated daily paper search, scoring, deduplication, and recommendation note generation

  • Paper → Knowledge Card — Convert paper analyses into knowledge cards that feed back into the knowledge flywheel

  • Keyword Auto-Linking — Scan notes for technical terms and create [[wiki-links]] automatically

Quick Start

Embed into an existing project

cd my-project && git clone https://github.com/zfy465914233/scholar-agent.git
bash scholar-agent/setup.sh
# Restart Claude Code to activate

This will create the directory structure, copy config templates, install skills, and build the knowledge index.

Use as a standalone project

# Clone and install
git clone https://github.com/zfy465914233/scholar-agent.git
cd scholar-agent
pip install -r requirements.txt

# Build the knowledge index
python scripts/local_index.py --output indexes/local/index.json

MCP configs are pre-configured:

  • Claude Code: .mcp.json is ready. cd into the project and start Claude Code.

  • VS Code Copilot: .vscode/mcp.json is ready. Open the project, enable agent mode.

MCP Tools

Core Tools (always available)

Tool

Description

query_knowledge

Search local knowledge base

save_research

Save structured research results as a knowledge card

list_knowledge

Browse all knowledge cards

capture_answer

Quick-capture a Q&A pair as a draft card

ingest_source

Ingest a URL or raw text into the knowledge base

build_graph

Generate an interactive knowledge graph (vis.js)

Academic Tools (set SCHOLAR_ACADEMIC=1 to enable)

Tool

Description

search_papers

Search arXiv + Semantic Scholar with 4-dim scoring

search_conf_papers

Search conference papers via DBLP + S2 enrichment

analyze_paper

Generate deep-analysis markdown notes (20+ sections)

extract_paper_images

Extract figures from arXiv source / PDF

paper_to_card

Convert paper analysis into a knowledge card

daily_recommend

Daily paper recommendation workflow

link_paper_keywords

Auto-link keywords as [[wikilinks]] in notes

For best analysis quality, follow this order:

  1. Download the paper: download_paper("2510.24701", title="Paper Title", domain="LLM")

  2. Extract images: extract_paper_images("2510.24701") (auto-detects local PDF)

  3. Deep analysis: analyze_paper(paper_json) (auto-detects local PDF, extracts full text)

Tip: Downloading the PDF before analysis enables full-text extraction, producing high-quality notes with specific data, formulas, and experimental results. Without a local PDF, analysis relies on the abstract only.

Configuration

.scholar.json

The .scholar.json file configures knowledge paths and academic research settings. See .scholar.example.json for a full example with comments.

Key sections:

  • knowledge_dir — Path to knowledge cards directory

  • index_path — Path to BM25 search index

  • academic.research_interests — Your research domains, keywords, and arXiv categories

  • academic.scoring — Paper scoring weights and dimensions

Environment Variables

Copy .env.example to .env and configure:

Variable

Required

Description

SCHOLAR_ACADEMIC

No

Set to 1 to enable academic tools

S2_API_KEY

No

Semantic Scholar API key (get one free)

LLM_API_KEY

No

LLM API key for advanced synthesis pipeline

Project Structure

scholar-agent/
├── mcp_server.py              # MCP server (13 tools)
├── setup_mcp.py               # Embed into existing projects
├── pyproject.toml             # Package configuration
├── .scholar.json               # Project & academic configuration
├── schemas/                   # Answer + evidence JSON schemas
├── scripts/
│   ├── academic/              # Academic research modules
│   │   ├── arxiv_search.py    # arXiv + Semantic Scholar search
│   │   ├── conf_search.py     # Conference paper search (DBLP)
│   │   ├── paper_analyzer.py  # Deep-analysis note generation
│   │   ├── scoring.py         # 4-dim paper scoring engine
│   │   ├── image_extractor.py # Figure extraction from PDFs
│   │   ├── note_linker.py     # Wiki-link discovery + keyword linking
│   │   └── daily_workflow.py  # Daily recommendation pipeline
│   ├── scholar_config.py       # Configuration reader
│   ├── local_index.py         # BM25 index builder
│   ├── local_retrieve.py      # Knowledge retrieval
│   ├── close_knowledge_loop.py # Knowledge card builder
│   └── ...                    # Research, synthesis, governance, graph
├── knowledge/                 # Knowledge cards (gitignored, user-generated)
├── indexes/                   # Generated indexes (gitignored)
└── tests/                     # 247 tests

More Features

  • Multi-perspective research — Parallel research from 5 perspectives (academic, technical, applied, contrarian, historical)

  • Obsidian compatible — Standard Markdown + YAML frontmatter + [[wiki-links]]

  • Knowledge governance CLI — Validate frontmatter, detect orphaned cards, find duplicates, manage lifecycle

  • Provider fault tolerance — Each search source fails independently; falls back to local retrieval when offline

Testing

python -m pytest tests/ -v

247 tests, ~13s. No external services needed.

License

MIT — see LICENSE.

A
license - permissive license
-
quality - not tested
C
maintenance

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/zfy465914233/scholar-agent'

If you have feedback or need assistance with the MCP directory API, please join our Discord server