search_papers
Search arXiv for academic papers using natural language queries, filter by date and category, and sort results by relevance or submission date.
Instructions
Search for papers on arXiv. It can parse natural language queries, extracting keywords and years for filtering.
:param query: The base search query. Can be natural language. :param max_results: The maximum number of results to return. :param start_date: The start date for the search period (YYYY-MM-DD or YYYY). Overrides years in query. :param end_date: The end date for the search period (YYYY-MM-DD or YYYY). Overrides years in query. :param sort_by_relevance: If True, sorts by relevance. If False, sorts by submission date. :param category: The arXiv category to search in (e.g., 'cs.AI', 'cs.CL', 'cs.SE').
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| query | Yes | ||
| max_results | No | ||
| start_date | No | ||
| end_date | No | ||
| sort_by_relevance | No | ||
| category | No | cs.SE |
Implementation Reference
- arxiv_searcher/arxiv_mcp.py:158-274 (handler)Primary synchronous implementation of the 'search_papers' MCP tool handler. Parses natural language queries, extracts keywords and dates, builds advanced arXiv search queries with category and date filters, performs the search, and returns structured results.@mcp.tool def search_papers( query: str, max_results: int = 10, start_date: str | None = None, end_date: str | None = None, sort_by_relevance: bool = True, category: str = "cs.SE", ) -> dict: """ Search for papers on arXiv. It can parse natural language queries, extracting keywords and years for filtering. :param query: The base search query. Can be natural language. :param max_results: The maximum number of results to return. :param start_date: The start date for the search period (YYYY-MM-DD or YYYY). Overrides years in query. :param end_date: The end date for the search period (YYYY-MM-DD or YYYY). Overrides years in query. :param sort_by_relevance: If True, sorts by relevance. If False, sorts by submission date. :param category: The arXiv category to search in (e.g., 'cs.AI', 'cs.CL', 'cs.SE'). """ STOP_WORDS = { "a", "an", "and", "the", "of", "in", "for", "to", "with", "on", "is", "are", "was", "were", "it", } # Extract years from query to use as date filters if not provided explicitly years_in_query = re.findall(r"\b(20\d{2})\b", query) query_text = re.sub(r"\b(20\d{2})\b", "", query).strip() # Use provided dates or fall back to dates from query effective_start_date = start_date if not effective_start_date and years_in_query: effective_start_date = min(years_in_query) effective_end_date = end_date if not effective_end_date and years_in_query: effective_end_date = max(years_in_query) # Process keywords from the query text keywords = [ word for word in query_text.split() if word.lower() not in STOP_WORDS and len(word) > 2 ] if keywords: # Build a structured query from keywords, joining with OR for broader results keyword_query = " OR ".join([f'(ti:"{kw}" OR abs:"{kw}")' for kw in keywords]) query_parts = [f"({keyword_query})"] else: # Fallback to using the original query text if no keywords are left query_parts = [f'(ti:"{query_text}" OR abs:"{query_text}")'] if category: query_parts.append(f"cat:{category}") # Add date range to the query if effective_start_date or effective_end_date: start = "19910814" if effective_start_date: try: dt = datetime.strptime(effective_start_date, "%Y-%m-%d") except ValueError: dt = datetime.strptime(effective_start_date, "%Y") start = dt.strftime("%Y%m%d") end = datetime.now().strftime("%Y%m%d") if effective_end_date: try: dt = datetime.strptime(effective_end_date, "%Y-%m-%d") except ValueError: dt = datetime.strptime(effective_end_date, "%Y") dt = dt.replace(month=12, day=31) end = dt.strftime("%Y%m%d") query_parts.append(f"submittedDate:[{start} TO {end}]") final_query = " AND ".join(query_parts) print(f"[arxiv-search] Query sent: {final_query}") sort_criterion = ( arxiv.SortCriterion.Relevance if sort_by_relevance else arxiv.SortCriterion.SubmittedDate ) search = arxiv.Search( query=final_query, max_results=max_results, sort_by=sort_criterion, sort_order=arxiv.SortOrder.Descending, ) results = [] for r in search.results(): results.append( { "title": r.title, "authors": [a.name for a in r.authors], "summary": r.summary, "pdf_url": r.pdf_url, "published_date": r.published.strftime("%Y-%m-%d"), } ) return {"query_used": final_query, "results": results}
- Asynchronous variant of the 'search_papers' MCP tool handler for remote deployment. Identical logic to the synchronous version but defined as async.@mcp.tool async def search_papers( query: str, max_results: int = 10, start_date: str | None = None, end_date: str | None = None, sort_by_relevance: bool = True, category: str = "cs.SE", ) -> dict: """ Search for papers on arXiv. It can parse natural language queries, extracting keywords and years for filtering. :param query: The base search query. Can be natural language. :param max_results: The maximum number of results to return. :param start_date: The start date for the search period (YYYY-MM-DD or YYYY). Overrides years in query. :param end_date: The end date for the search period (YYYY-MM-DD or YYYY). Overrides years in query. :param sort_by_relevance: If True, sorts by relevance. If False, sorts by submission date. :param category: The arXiv category to search in (e.g., 'cs.AI', 'cs.CL', 'cs.SE'). """ STOP_WORDS = { "a", "an", "and", "the", "of", "in", "for", "to", "with", "on", "is", "are", "was", "were", "it", } # Extract years from query to use as date filters if not provided explicitly years_in_query = re.findall(r"\b(20\d{2})\b", query) query_text = re.sub(r"\b(20\d{2})\b", "", query).strip() # Use provided dates or fall back to dates from query effective_start_date = start_date if not effective_start_date and years_in_query: effective_start_date = min(years_in_query) effective_end_date = end_date if not effective_end_date and years_in_query: effective_end_date = max(years_in_query) # Process keywords from the query text keywords = [ word for word in query_text.split() if word.lower() not in STOP_WORDS and len(word) > 2 ] if keywords: # Build a structured query from keywords, joining with OR for broader results keyword_query = " OR ".join([f'(ti:"{kw}" OR abs:"{kw}")' for kw in keywords]) query_parts = [f"({keyword_query})"] else: # Fallback to using the original query text if no keywords are left query_parts = [f'(ti:"{query_text}" OR abs:"{query_text}")'] if category: query_parts.append(f"cat:{category}") # Add date range to the query if effective_start_date or effective_end_date: start = "19910814" if effective_start_date: try: dt = datetime.strptime(effective_start_date, "%Y-%m-%d") except ValueError: dt = datetime.strptime(effective_start_date, "%Y") start = dt.strftime("%Y%m%d") end = datetime.now().strftime("%Y%m%d") if effective_end_date: try: dt = datetime.strptime(effective_end_date, "%Y-%m-%d") except ValueError: dt = datetime.strptime(effective_end_date, "%Y") dt = dt.replace(month=12, day=31) end = dt.strftime("%Y%m%d") query_parts.append(f"submittedDate:[{start} TO {end}]") final_query = " AND ".join(query_parts) print(f"[arxiv-search] Query sent: {final_query}") sort_criterion = ( arxiv.SortCriterion.Relevance if sort_by_relevance else arxiv.SortCriterion.SubmittedDate ) search = arxiv.Search( query=final_query, max_results=max_results, sort_by=sort_criterion, sort_order=arxiv.SortOrder.Descending, ) results = [] for r in search.results(): results.append( { "title": r.title, "authors": [a.name for a in r.authors], "summary": r.summary, "pdf_url": r.pdf_url, "published_date": r.published.strftime("%Y-%m-%d"), } ) return {"query_used": final_query, "results": results}
- arxiv_searcher/arxiv_mcp.py:167-177 (schema)Input schema and parameters documentation for the search_papers tool, including descriptions of all arguments and their usage.""" Search for papers on arXiv. It can parse natural language queries, extracting keywords and years for filtering. :param query: The base search query. Can be natural language. :param max_results: The maximum number of results to return. :param start_date: The start date for the search period (YYYY-MM-DD or YYYY). Overrides years in query. :param end_date: The end date for the search period (YYYY-MM-DD or YYYY). Overrides years in query. :param sort_by_relevance: If True, sorts by relevance. If False, sorts by submission date. :param category: The arXiv category to search in (e.g., 'cs.AI', 'cs.CL', 'cs.SE'). """
- arxiv_searcher/arxiv_mcp.py:158-158 (registration)FastMCP tool registration decorator for search_papers in the main implementation file.@mcp.tool
- arxiv_searcher/arxiv_mcp.py:102-105 (helper)Documentation in search_tips resource explaining usage of search_papers tool parameters.**4. Date Syntax:** - The `search_papers` tool handles dates with the `start_date` and `end_date` parameters. - It is preferable to use these parameters instead of including dates directly in the `query` for greater precision. """