search_arxiv
Search arXiv for preprints in physics, math, computer science, and related fields to access free PDFs and full-text content before peer review.
Instructions
Search preprints on arXiv - major open-access preprint server.
USE THIS TOOL WHEN:
- Searching for PREPRINTS (not peer-reviewed yet)
- You need free, immediate access to full-text PDFs
- Searching in: Physics, Mathematics, Computer Science, Statistics,
Quantitative Biology, Quantitative Finance, Electrical Engineering
NOTE: arXiv is a PREPRINT server - papers may not be peer-reviewed.
For peer-reviewed papers, use search_crossref or search_semantic.
WORKFLOW:
1. search_arxiv(query) -> get paper_id (e.g., '2106.12345')
2. download_arxiv(paper_id) -> get PDF (always available)
3. read_arxiv_paper(paper_id) -> get full text as Markdown
Args:
query: Search terms in any supported field.
max_results: Number of results (default: 10).
Returns:
List of paper dicts with: paper_id, title, authors, abstract,
published_date, pdf_url, categories.
Example:
search_arxiv("quantum computing error correction", max_results=5)
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| query | Yes | ||
| max_results | No |
Implementation Reference
- paper_find_mcp/server.py:162-192 (handler)Primary handler for the search_arxiv MCP tool. Registered with @mcp.tool() decorator. Delegates to shared _search helper using 'arxiv' searcher instance.@mcp.tool() async def search_arxiv(query: str, max_results: int = 10) -> List[Dict]: """Search preprints on arXiv - major open-access preprint server. USE THIS TOOL WHEN: - Searching for PREPRINTS (not peer-reviewed yet) - You need free, immediate access to full-text PDFs - Searching in: Physics, Mathematics, Computer Science, Statistics, Quantitative Biology, Quantitative Finance, Electrical Engineering NOTE: arXiv is a PREPRINT server - papers may not be peer-reviewed. For peer-reviewed papers, use search_crossref or search_semantic. WORKFLOW: 1. search_arxiv(query) -> get paper_id (e.g., '2106.12345') 2. download_arxiv(paper_id) -> get PDF (always available) 3. read_arxiv_paper(paper_id) -> get full text as Markdown Args: query: Search terms in any supported field. max_results: Number of results (default: 10). Returns: List of paper dicts with: paper_id, title, authors, abstract, published_date, pdf_url, categories. Example: search_arxiv("quantum computing error correction", max_results=5) """ return await _search('arxiv', query, max_results)
- paper_find_mcp/server.py:95-113 (helper)Shared helper function _search that all platform tools (including search_arxiv) call to execute searches via platform-specific searcher instances.async def _search( searcher_name: str, query: str, max_results: int = 10, **kwargs ) -> List[Dict]: """通用搜索函数""" searcher = SEARCHERS.get(searcher_name) if not searcher: logger.error(f"Unknown searcher: {searcher_name}") return [] try: papers = searcher.search(query, max_results=max_results, **kwargs) return [paper.to_dict() for paper in papers] except Exception as e: logger.error(f"Search failed for {searcher_name}: {e}") return []
- paper_find_mcp/server.py:75-85 (registration)Global SEARCHERS dictionary registering the ArxivSearcher instance under 'arxiv' key, used by _search helper.SEARCHERS = { 'arxiv': ArxivSearcher(), 'pubmed': PubMedSearcher(), 'biorxiv': BioRxivSearcher(), 'medrxiv': MedRxivSearcher(), 'google_scholar': GoogleScholarSearcher(), 'iacr': IACRSearcher(), 'semantic': SemanticSearcher(), 'crossref': CrossRefSearcher(), 'repec': RePECSearcher(), }
- Core search logic in ArxivSearcher.search(): queries arXiv API (export.arxiv.org/api/query), parses Atom feed with feedparser, constructs Paper objects from entries.def search(self, query: str, max_results: int = 10) -> List[Paper]: """搜索 arXiv 论文 Args: query: 搜索关键词,支持 arXiv 查询语法 例如: "ti:attention" (标题), "au:hinton" (作者) max_results: 最大返回数量 Returns: List[Paper]: 论文列表 """ params = { 'search_query': query, 'max_results': max_results, 'sortBy': 'submittedDate', 'sortOrder': 'descending' } try: response = requests.get(self.BASE_URL, params=params, timeout=30) response.raise_for_status() except requests.RequestException as e: logger.error(f"arXiv API request failed: {e}") return [] feed = feedparser.parse(response.content) papers = [] for entry in feed.entries: try: authors = [author.name for author in entry.authors] published = datetime.strptime(entry.published, '%Y-%m-%dT%H:%M:%SZ') updated = datetime.strptime(entry.updated, '%Y-%m-%dT%H:%M:%SZ') pdf_url = next( (link.href for link in entry.links if link.type == 'application/pdf'), '' ) papers.append(Paper( paper_id=entry.id.split('/')[-1], title=entry.title.replace('\n', ' ').strip(), authors=authors, abstract=entry.summary.replace('\n', ' ').strip(), url=entry.id, pdf_url=pdf_url, published_date=published, updated_date=updated, source='arxiv', categories=[tag.term for tag in entry.tags], keywords=[], doi=entry.get('doi', '') )) except Exception as e: logger.warning(f"Error parsing arXiv entry: {e}") return papers
- Abstract base class PaperSource defining the interface/schema for all searchers, including ArxivSearcher with search method signature.class PaperSource: """Abstract base class for paper sources""" def search(self, query: str, **kwargs) -> List[Paper]: raise NotImplementedError def download_pdf(self, paper_id: str, save_path: str) -> str: raise NotImplementedError def read_paper(self, paper_id: str, save_path: str) -> str: raise NotImplementedError