search_arxiv
Search academic papers from arXiv to find relevant research publications using specific queries and return paper metadata.
Instructions
Search academic papers from arXiv.
Args: query: Search query string (e.g., 'machine learning'). max_results: Maximum number of papers to return (default: 10). Returns: List of paper metadata in dictionary format.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| query | Yes | ||
| max_results | No |
Implementation Reference
- paper_search_mcp/server.py:358-368 (handler)The MCP tool wrapper that calls the async_search helper with the arxiv_searcher instance.
async def search_arxiv(query: str, max_results: int = 10) -> List[Dict]: """Search academic papers from arXiv. Args: query: Search query string (e.g., 'machine learning'). max_results: Maximum number of papers to return (default: 10). Returns: List of paper metadata in dictionary format. """ papers = await async_search(arxiv_searcher, query, max_results) return papers if papers else [] - The core implementation of the Arxiv search logic using feedparser to parse arXiv API response.
def search(self, query: str, max_results: int = 10) -> List[Paper]: params = { 'search_query': f'all:{query}', 'max_results': max_results, 'sortBy': 'submittedDate', 'sortOrder': 'descending' } response = None for attempt in range(3): try: response = self.session.get(self.BASE_URL, params=params, timeout=30) except requests.RequestException: time.sleep((attempt + 1) * 1.5) continue if response.status_code == 200: break if response.status_code in (429, 500, 502, 503, 504): time.sleep((attempt + 1) * 1.5) continue break if response is None or response.status_code != 200: return [] feed = feedparser.parse(response.content) papers = [] for entry in feed.entries: try: authors = [author.name for author in entry.authors] published = datetime.strptime(entry.published, '%Y-%m-%dT%H:%M:%SZ') updated = datetime.strptime(entry.updated, '%Y-%m-%dT%H:%M:%SZ') pdf_url = next((link.href for link in entry.links if link.type == 'application/pdf'), '') # Try to extract DOI from entry.doi or links or summary doi = entry.get('doi', '') or extract_doi(entry.summary) or extract_doi(entry.id) for link in entry.links: if link.get('title') == 'doi': doi = doi or extract_doi(link.href) papers.append(Paper( paper_id=entry.id.split('/')[-1], title=entry.title, authors=authors, abstract=entry.summary, url=entry.id, pdf_url=pdf_url, published_date=published, updated_date=updated, source='arxiv', categories=[tag.term for tag in entry.tags], keywords=[], doi=doi )) except Exception as e: print(f"Error parsing arXiv entry: {e}") return papers