Search Papers
search_papersSearch over 600,000 CS/AI/ML papers using semantic or keyword queries. Filter by impact, recency, novelty, code availability, and more.
Instructions
Search Scholar Feed's 600k+ CS/AI/ML paper corpus. Defaults to semantic (embedding) search — finds conceptually related papers even when the user's wording doesn't match the paper's title/abstract. Pass mode='keyword' for exact-string full-text search. CAVEAT: semantic search often misses old high-citation CANONICAL papers (e.g. foundational anchors like H2O for KV eviction, GRIT for unified embedding+generation) because the ranker prefers recent stylistically-matched papers. If you're hunting the canonical anchor for an area, parse the top-5 result abstracts for baseline mentions ('we compare against X, Y, Z'), then look the most-mentioned name up directly. Returns papers with LLM-generated summaries, novelty scores, and structured extraction data. Default response is a lean 14-field shape (arxiv_id, title, authors, year, categories, has_code, github_url, citation_count, venue_name, llm_summary, llm_significance, llm_novelty_score, impact_pct, impact_tier) — pass verbose=true or fields=... for the full shape with method/task/dataset extraction. RANKING BY IMPACT — two different notions, don't confuse them: (1) PROVEN impact = citations. For 'the important/seminal papers on topic X', pass sort='impactful' (most-cited among the relevant) or sort='balanced' (relevant AND well-cited). This is the right tool for established/foundational work. (2) FORECAST impact = impact_pct (0-100), an ML per-category percentile of PREDICTED citations, only computed for the last ~90 days; impact_tier is its A+/A/B/C/D grade. For 'what's rising/new in X' pass sort='trending' or filter impact_min=N — but NOTE impact_pct is NULL on everything older than ~90 days, so impact_min DROPS all established/canonical papers (it is NOT a way to find the influential papers in a niche — use sort='impactful' for that). Both impact notions are distinct from llm_novelty_score (new-idea-ness, an orthogonal filter). (3) ADOPTION impact = GitHub traction. Pass sort='community' to rank by real-world engineering adoption (stars + star-velocity) — the papers practitioners are actually running/building on, independent of citations or recency. Filter on it with min_stars=N (minimum GitHub stars) and has_code=true (only papers with a code release); has_code/min_stars surface RUNNABLE/ADOPTED work, the engineering counterpart to citations. github_url_exists=true is the stricter has_code (requires a linked repo). Supports filtering by category, novelty, recency, method, task, dataset, and contribution type — plus min_citations (minimum PROVEN citations, keeps established papers unlike the ~90-day impact_min) and an explicit date window via published_after / published_before ('YYYY-MM-DD', vs days' rolling lookback). v3 ABSORPTIONS: pass sort='trending' to rank by rising/forecast impact (impact_pct); pass anchor_paper_id to replicate find_similar (q is ignored in anchor mode, results carry similarity_score); pass scope_to_citations_of to restrict search to a paper's citation graph (replaces find_citations_about).
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| q | No | Search query keywords. Optional when anchor_paper_id is set (anchor mode ignores q and returns papers similar to the anchor). | |
| sort | No | Result ranking — a relevance↔impact dial plus time-based and adoption orders. 'relevance' (default) = best topical match. 'balanced' = relevant AND well-cited. 'impactful' = the most-cited (proven-influential) papers among those relevant to the query — use this for 'the important/seminal papers on topic X'. 'trending' = rising/FORECAST impact (impact_pct, last ~90 days) — use for 'what's hot/new in X', NOT for established work. 'recent' = newest first. 'community' = GitHub adoption (stars + star-velocity) — surfaces the papers practitioners are actually running/building on, regardless of citations or recency. Proven impact ('impactful'/'balanced') ranks by real citations; 'trending' is a model prediction; 'community' is real-world engineering traction. Pair with get_foundational_lineage for a topic's canonical roots. | |
| anchor_paper_id | No | Return papers similar to this arXiv paper ID (replaces the removed find_similar tool). When set, q is ignored and results carry similarity_score. Example: '2407.15831'. | |
| scope_to_citations_of | No | Restrict search to this paper's citation graph, ranked by relevance to q (replaces the removed find_citations_about tool). Pass the arXiv ID of the paper whose citations you want to search within. | |
| category | No | Filter by arXiv category e.g. 'cs.AI', 'cs.LG' | |
| novelty_min | No | Minimum novelty score (0-1). Use 0.5+ for novel papers. | |
| impact_min | No | Minimum impact_pct (0-100), e.g. 80 = top 20% FORECAST impact. This is a RISING-WORK filter: impact_pct is only computed for the last ~90 days, so impact_min restricts results to recent papers predicted to land well AND DROPS everything older. Use it for 'what's rising in X'. Do NOT use it to find the influential/seminal papers in a topic — that excludes the established work; use sort='impactful' instead. | |
| days | No | Limit to papers published within N days | |
| has_code | No | Filter to papers with a linked code release (has_code=true). Surfaces runnable/reproducible work — pair with min_stars/sort='community' to find the papers practitioners actually adopt. | |
| min_citations | No | Minimum real citation count. Unlike impact_min (a ~90-day FORECAST percentile), this filters on PROVEN citations and keeps established/canonical papers. | |
| min_stars | No | Minimum GitHub stars on the paper's linked repo. A proxy for engineering adoption — surfaces work that practitioners are actually running/building on. Pair with sort='community' to rank by it. | |
| github_url_exists | No | Filter on whether the paper has a linked GitHub URL (true = only papers with a repo). Stricter than has_code (which counts any code link). | |
| published_after | No | Only papers published on or after this date, 'YYYY-MM-DD'. Use with published_before to bound an arbitrary date window (days only gives a rolling N-day lookback). | |
| published_before | No | Only papers published on or before this date, 'YYYY-MM-DD'. Pair with published_after for an explicit window. | |
| method_category | No | Filter by method category e.g. 'reinforcement learning', 'transformer' | |
| method_name | No | Filter to papers introducing/using a specific named method e.g. 'LoRA', 'YOLO', 'DPO'. Case-insensitive substring match on the extracted method_name field. | |
| task | No | Filter by task e.g. 'image classification', 'question answering' (partial match) | |
| dataset | No | Filter to papers that evaluate on a specific dataset e.g. 'MMLU', 'ImageNet' | |
| contribution_type | No | Filter by paper's contribution type | |
| task_category | No | Filter by broad research area | |
| mode | No | Search mode. 'semantic' (default) uses embedding similarity — finds conceptually related papers even without exact keyword matches. 'keyword' uses Postgres full-text search — faster but only matches exact terms. | |
| cursor | No | Cursor from previous response's next_cursor for keyset pagination | |
| page | No | Page number | |
| limit | No | Results per page (max 50) | |
| fields | No | Comma-separated list of fields to return (e.g. 'arxiv_id,title,llm_summary,llm_novelty_score'). If omitted, returns the lean 12-field default unless verbose=true. | |
| verbose | No | If true, returns the full 28-field paper shape (method/task/dataset extraction, application_domain, baselines, etc.). Default false returns the lean 12-field set. Ignored when `fields` is provided. | |
| exclude_ids | No | arXiv IDs to exclude from results (for deduplication across chained calls) |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| papers | No | Matched / returned papers. | |
| total | No | Total results available for the query. | |
| page | No | ||
| limit | No | ||
| mode | No | Search mode actually applied. | |
| direction | No | Citation direction (get_citations: citing | cited_by). | |
| topic | No | ||
| note | No | ||
| not_found | No | Requested IDs that had no match. | |
| next_cursor | No | Keyset cursor for the next page, or null when exhausted. | |
| hits | No | New watch matches (check_watches). | |
| results | No |