Skip to main content
Glama
YGao2005

Scholar Feed MCP Server

Search Papers

search_papers
Read-only

Search over 600,000 CS/AI/ML papers using semantic or keyword queries. Filter by impact, recency, novelty, code availability, and more.

Instructions

Search Scholar Feed's 600k+ CS/AI/ML paper corpus. Defaults to semantic (embedding) search — finds conceptually related papers even when the user's wording doesn't match the paper's title/abstract. Pass mode='keyword' for exact-string full-text search. CAVEAT: semantic search often misses old high-citation CANONICAL papers (e.g. foundational anchors like H2O for KV eviction, GRIT for unified embedding+generation) because the ranker prefers recent stylistically-matched papers. If you're hunting the canonical anchor for an area, parse the top-5 result abstracts for baseline mentions ('we compare against X, Y, Z'), then look the most-mentioned name up directly. Returns papers with LLM-generated summaries, novelty scores, and structured extraction data. Default response is a lean 14-field shape (arxiv_id, title, authors, year, categories, has_code, github_url, citation_count, venue_name, llm_summary, llm_significance, llm_novelty_score, impact_pct, impact_tier) — pass verbose=true or fields=... for the full shape with method/task/dataset extraction. RANKING BY IMPACT — two different notions, don't confuse them: (1) PROVEN impact = citations. For 'the important/seminal papers on topic X', pass sort='impactful' (most-cited among the relevant) or sort='balanced' (relevant AND well-cited). This is the right tool for established/foundational work. (2) FORECAST impact = impact_pct (0-100), an ML per-category percentile of PREDICTED citations, only computed for the last ~90 days; impact_tier is its A+/A/B/C/D grade. For 'what's rising/new in X' pass sort='trending' or filter impact_min=N — but NOTE impact_pct is NULL on everything older than ~90 days, so impact_min DROPS all established/canonical papers (it is NOT a way to find the influential papers in a niche — use sort='impactful' for that). Both impact notions are distinct from llm_novelty_score (new-idea-ness, an orthogonal filter). (3) ADOPTION impact = GitHub traction. Pass sort='community' to rank by real-world engineering adoption (stars + star-velocity) — the papers practitioners are actually running/building on, independent of citations or recency. Filter on it with min_stars=N (minimum GitHub stars) and has_code=true (only papers with a code release); has_code/min_stars surface RUNNABLE/ADOPTED work, the engineering counterpart to citations. github_url_exists=true is the stricter has_code (requires a linked repo). Supports filtering by category, novelty, recency, method, task, dataset, and contribution type — plus min_citations (minimum PROVEN citations, keeps established papers unlike the ~90-day impact_min) and an explicit date window via published_after / published_before ('YYYY-MM-DD', vs days' rolling lookback). v3 ABSORPTIONS: pass sort='trending' to rank by rising/forecast impact (impact_pct); pass anchor_paper_id to replicate find_similar (q is ignored in anchor mode, results carry similarity_score); pass scope_to_citations_of to restrict search to a paper's citation graph (replaces find_citations_about).

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
qNoSearch query keywords. Optional when anchor_paper_id is set (anchor mode ignores q and returns papers similar to the anchor).
sortNoResult ranking — a relevance↔impact dial plus time-based and adoption orders. 'relevance' (default) = best topical match. 'balanced' = relevant AND well-cited. 'impactful' = the most-cited (proven-influential) papers among those relevant to the query — use this for 'the important/seminal papers on topic X'. 'trending' = rising/FORECAST impact (impact_pct, last ~90 days) — use for 'what's hot/new in X', NOT for established work. 'recent' = newest first. 'community' = GitHub adoption (stars + star-velocity) — surfaces the papers practitioners are actually running/building on, regardless of citations or recency. Proven impact ('impactful'/'balanced') ranks by real citations; 'trending' is a model prediction; 'community' is real-world engineering traction. Pair with get_foundational_lineage for a topic's canonical roots.
anchor_paper_idNoReturn papers similar to this arXiv paper ID (replaces the removed find_similar tool). When set, q is ignored and results carry similarity_score. Example: '2407.15831'.
scope_to_citations_ofNoRestrict search to this paper's citation graph, ranked by relevance to q (replaces the removed find_citations_about tool). Pass the arXiv ID of the paper whose citations you want to search within.
categoryNoFilter by arXiv category e.g. 'cs.AI', 'cs.LG'
novelty_minNoMinimum novelty score (0-1). Use 0.5+ for novel papers.
impact_minNoMinimum impact_pct (0-100), e.g. 80 = top 20% FORECAST impact. This is a RISING-WORK filter: impact_pct is only computed for the last ~90 days, so impact_min restricts results to recent papers predicted to land well AND DROPS everything older. Use it for 'what's rising in X'. Do NOT use it to find the influential/seminal papers in a topic — that excludes the established work; use sort='impactful' instead.
daysNoLimit to papers published within N days
has_codeNoFilter to papers with a linked code release (has_code=true). Surfaces runnable/reproducible work — pair with min_stars/sort='community' to find the papers practitioners actually adopt.
min_citationsNoMinimum real citation count. Unlike impact_min (a ~90-day FORECAST percentile), this filters on PROVEN citations and keeps established/canonical papers.
min_starsNoMinimum GitHub stars on the paper's linked repo. A proxy for engineering adoption — surfaces work that practitioners are actually running/building on. Pair with sort='community' to rank by it.
github_url_existsNoFilter on whether the paper has a linked GitHub URL (true = only papers with a repo). Stricter than has_code (which counts any code link).
published_afterNoOnly papers published on or after this date, 'YYYY-MM-DD'. Use with published_before to bound an arbitrary date window (days only gives a rolling N-day lookback).
published_beforeNoOnly papers published on or before this date, 'YYYY-MM-DD'. Pair with published_after for an explicit window.
method_categoryNoFilter by method category e.g. 'reinforcement learning', 'transformer'
method_nameNoFilter to papers introducing/using a specific named method e.g. 'LoRA', 'YOLO', 'DPO'. Case-insensitive substring match on the extracted method_name field.
taskNoFilter by task e.g. 'image classification', 'question answering' (partial match)
datasetNoFilter to papers that evaluate on a specific dataset e.g. 'MMLU', 'ImageNet'
contribution_typeNoFilter by paper's contribution type
task_categoryNoFilter by broad research area
modeNoSearch mode. 'semantic' (default) uses embedding similarity — finds conceptually related papers even without exact keyword matches. 'keyword' uses Postgres full-text search — faster but only matches exact terms.
cursorNoCursor from previous response's next_cursor for keyset pagination
pageNoPage number
limitNoResults per page (max 50)
fieldsNoComma-separated list of fields to return (e.g. 'arxiv_id,title,llm_summary,llm_novelty_score'). If omitted, returns the lean 12-field default unless verbose=true.
verboseNoIf true, returns the full 28-field paper shape (method/task/dataset extraction, application_domain, baselines, etc.). Default false returns the lean 12-field set. Ignored when `fields` is provided.
exclude_idsNoarXiv IDs to exclude from results (for deduplication across chained calls)

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
papersNoMatched / returned papers.
totalNoTotal results available for the query.
pageNo
limitNo
modeNoSearch mode actually applied.
directionNoCitation direction (get_citations: citing | cited_by).
topicNo
noteNo
not_foundNoRequested IDs that had no match.
next_cursorNoKeyset cursor for the next page, or null when exhausted.
hitsNoNew watch matches (check_watches).
resultsNo
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses behavioral traits beyond annotations: default semantic search may miss canonical papers, impact_pct is only computed for last ~90 days, absorption of find_similar and find_citations_about, and ranking intricacies. Annotations only provide readOnlyHint=true and destructiveHint=false, so the description adds substantial context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is long but well-structured with bolding and bullet points, front-loading the main purpose. While verbose, every sentence earns its place given the tool's complexity (27 parameters). Could be slightly more compressed, but structure aids readability.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers all major aspects: search modes, ranking, filtering, absorbed tools, caveats, and cross-references between parameters. With an output schema present and no required parameters, the description is fully complete for an AI agent to use correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, baseline is 3, but the description adds significant meaning beyond schemas: explains relationships between parameters (e.g., has_code + min_stars for adoption), caveats for impact_min, and how sort options relate to impact notions. This extra context fully justifies a 5.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Search Scholar Feed's 600k+ CS/AI/ML paper corpus.' It specifies verb+resource and distinguishes from siblings by mentioning absorbed tools (find_similar, find_citations_about) and referencing get_foundational_lineage as an alternative for canonical roots.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides extensive guidance: when to use semantic vs keyword search, when to use different sort options (impactful vs trending vs community), caveats about canonical papers, and pairing with other tools like get_foundational_lineage. It also warns about impact_min dropping older papers and clarifies distinct impact notions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/YGao2005/scholar-feed-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server