Skip to main content
Glama
h-lu
by h-lu

search_arxiv

Search arXiv for preprints in physics, math, computer science, and related fields to access free PDFs and full-text content before peer review.

Instructions

Search preprints on arXiv - major open-access preprint server.

USE THIS TOOL WHEN: - Searching for PREPRINTS (not peer-reviewed yet) - You need free, immediate access to full-text PDFs - Searching in: Physics, Mathematics, Computer Science, Statistics, Quantitative Biology, Quantitative Finance, Electrical Engineering NOTE: arXiv is a PREPRINT server - papers may not be peer-reviewed. For peer-reviewed papers, use search_crossref or search_semantic. WORKFLOW: 1. search_arxiv(query) -> get paper_id (e.g., '2106.12345') 2. download_arxiv(paper_id) -> get PDF (always available) 3. read_arxiv_paper(paper_id) -> get full text as Markdown Args: query: Search terms in any supported field. max_results: Number of results (default: 10). Returns: List of paper dicts with: paper_id, title, authors, abstract, published_date, pdf_url, categories. Example: search_arxiv("quantum computing error correction", max_results=5)

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
queryYes
max_resultsNo

Implementation Reference

  • Primary handler for the search_arxiv MCP tool. Registered with @mcp.tool() decorator. Delegates to shared _search helper using 'arxiv' searcher instance.
    @mcp.tool() async def search_arxiv(query: str, max_results: int = 10) -> List[Dict]: """Search preprints on arXiv - major open-access preprint server. USE THIS TOOL WHEN: - Searching for PREPRINTS (not peer-reviewed yet) - You need free, immediate access to full-text PDFs - Searching in: Physics, Mathematics, Computer Science, Statistics, Quantitative Biology, Quantitative Finance, Electrical Engineering NOTE: arXiv is a PREPRINT server - papers may not be peer-reviewed. For peer-reviewed papers, use search_crossref or search_semantic. WORKFLOW: 1. search_arxiv(query) -> get paper_id (e.g., '2106.12345') 2. download_arxiv(paper_id) -> get PDF (always available) 3. read_arxiv_paper(paper_id) -> get full text as Markdown Args: query: Search terms in any supported field. max_results: Number of results (default: 10). Returns: List of paper dicts with: paper_id, title, authors, abstract, published_date, pdf_url, categories. Example: search_arxiv("quantum computing error correction", max_results=5) """ return await _search('arxiv', query, max_results)
  • Shared helper function _search that all platform tools (including search_arxiv) call to execute searches via platform-specific searcher instances.
    async def _search( searcher_name: str, query: str, max_results: int = 10, **kwargs ) -> List[Dict]: """通用搜索函数""" searcher = SEARCHERS.get(searcher_name) if not searcher: logger.error(f"Unknown searcher: {searcher_name}") return [] try: papers = searcher.search(query, max_results=max_results, **kwargs) return [paper.to_dict() for paper in papers] except Exception as e: logger.error(f"Search failed for {searcher_name}: {e}") return []
  • Global SEARCHERS dictionary registering the ArxivSearcher instance under 'arxiv' key, used by _search helper.
    SEARCHERS = { 'arxiv': ArxivSearcher(), 'pubmed': PubMedSearcher(), 'biorxiv': BioRxivSearcher(), 'medrxiv': MedRxivSearcher(), 'google_scholar': GoogleScholarSearcher(), 'iacr': IACRSearcher(), 'semantic': SemanticSearcher(), 'crossref': CrossRefSearcher(), 'repec': RePECSearcher(), }
  • Core search logic in ArxivSearcher.search(): queries arXiv API (export.arxiv.org/api/query), parses Atom feed with feedparser, constructs Paper objects from entries.
    def search(self, query: str, max_results: int = 10) -> List[Paper]: """搜索 arXiv 论文 Args: query: 搜索关键词,支持 arXiv 查询语法 例如: "ti:attention" (标题), "au:hinton" (作者) max_results: 最大返回数量 Returns: List[Paper]: 论文列表 """ params = { 'search_query': query, 'max_results': max_results, 'sortBy': 'submittedDate', 'sortOrder': 'descending' } try: response = requests.get(self.BASE_URL, params=params, timeout=30) response.raise_for_status() except requests.RequestException as e: logger.error(f"arXiv API request failed: {e}") return [] feed = feedparser.parse(response.content) papers = [] for entry in feed.entries: try: authors = [author.name for author in entry.authors] published = datetime.strptime(entry.published, '%Y-%m-%dT%H:%M:%SZ') updated = datetime.strptime(entry.updated, '%Y-%m-%dT%H:%M:%SZ') pdf_url = next( (link.href for link in entry.links if link.type == 'application/pdf'), '' ) papers.append(Paper( paper_id=entry.id.split('/')[-1], title=entry.title.replace('\n', ' ').strip(), authors=authors, abstract=entry.summary.replace('\n', ' ').strip(), url=entry.id, pdf_url=pdf_url, published_date=published, updated_date=updated, source='arxiv', categories=[tag.term for tag in entry.tags], keywords=[], doi=entry.get('doi', '') )) except Exception as e: logger.warning(f"Error parsing arXiv entry: {e}") return papers
  • Abstract base class PaperSource defining the interface/schema for all searchers, including ArxivSearcher with search method signature.
    class PaperSource: """Abstract base class for paper sources""" def search(self, query: str, **kwargs) -> List[Paper]: raise NotImplementedError def download_pdf(self, paper_id: str, save_path: str) -> str: raise NotImplementedError def read_paper(self, paper_id: str, save_path: str) -> str: raise NotImplementedError

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/h-lu/paper-search-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server