Skip to main content
Glama
h-lu

Paper Search MCP Server

by h-lu

search_arxiv

Search arXiv for preprints in physics, math, computer science, and related fields to access free PDFs and full-text content before peer review.

Instructions

Search preprints on arXiv - major open-access preprint server.

USE THIS TOOL WHEN:
- Searching for PREPRINTS (not peer-reviewed yet)
- You need free, immediate access to full-text PDFs
- Searching in: Physics, Mathematics, Computer Science, Statistics,
  Quantitative Biology, Quantitative Finance, Electrical Engineering

NOTE: arXiv is a PREPRINT server - papers may not be peer-reviewed.
For peer-reviewed papers, use search_crossref or search_semantic.

WORKFLOW:
1. search_arxiv(query) -> get paper_id (e.g., '2106.12345')
2. download_arxiv(paper_id) -> get PDF (always available)
3. read_arxiv_paper(paper_id) -> get full text as Markdown

Args:
    query: Search terms in any supported field.
    max_results: Number of results (default: 10).

Returns:
    List of paper dicts with: paper_id, title, authors, abstract, 
    published_date, pdf_url, categories.

Example:
    search_arxiv("quantum computing error correction", max_results=5)

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
queryYes
max_resultsNo

Implementation Reference

  • Primary handler for the search_arxiv MCP tool. Registered with @mcp.tool() decorator. Delegates to shared _search helper using 'arxiv' searcher instance.
    @mcp.tool()
    async def search_arxiv(query: str, max_results: int = 10) -> List[Dict]:
        """Search preprints on arXiv - major open-access preprint server.
        
        USE THIS TOOL WHEN:
        - Searching for PREPRINTS (not peer-reviewed yet)
        - You need free, immediate access to full-text PDFs
        - Searching in: Physics, Mathematics, Computer Science, Statistics,
          Quantitative Biology, Quantitative Finance, Electrical Engineering
        
        NOTE: arXiv is a PREPRINT server - papers may not be peer-reviewed.
        For peer-reviewed papers, use search_crossref or search_semantic.
        
        WORKFLOW:
        1. search_arxiv(query) -> get paper_id (e.g., '2106.12345')
        2. download_arxiv(paper_id) -> get PDF (always available)
        3. read_arxiv_paper(paper_id) -> get full text as Markdown
    
        Args:
            query: Search terms in any supported field.
            max_results: Number of results (default: 10).
        
        Returns:
            List of paper dicts with: paper_id, title, authors, abstract, 
            published_date, pdf_url, categories.
        
        Example:
            search_arxiv("quantum computing error correction", max_results=5)
        """
        return await _search('arxiv', query, max_results)
  • Shared helper function _search that all platform tools (including search_arxiv) call to execute searches via platform-specific searcher instances.
    async def _search(
        searcher_name: str, 
        query: str, 
        max_results: int = 10,
        **kwargs
    ) -> List[Dict]:
        """通用搜索函数"""
        searcher = SEARCHERS.get(searcher_name)
        if not searcher:
            logger.error(f"Unknown searcher: {searcher_name}")
            return []
        
        try:
            papers = searcher.search(query, max_results=max_results, **kwargs)
            return [paper.to_dict() for paper in papers]
        except Exception as e:
            logger.error(f"Search failed for {searcher_name}: {e}")
            return []
  • Global SEARCHERS dictionary registering the ArxivSearcher instance under 'arxiv' key, used by _search helper.
    SEARCHERS = {
        'arxiv': ArxivSearcher(),
        'pubmed': PubMedSearcher(),
        'biorxiv': BioRxivSearcher(),
        'medrxiv': MedRxivSearcher(),
        'google_scholar': GoogleScholarSearcher(),
        'iacr': IACRSearcher(),
        'semantic': SemanticSearcher(),
        'crossref': CrossRefSearcher(),
        'repec': RePECSearcher(),
    }
  • Core search logic in ArxivSearcher.search(): queries arXiv API (export.arxiv.org/api/query), parses Atom feed with feedparser, constructs Paper objects from entries.
    def search(self, query: str, max_results: int = 10) -> List[Paper]:
        """搜索 arXiv 论文
        
        Args:
            query: 搜索关键词,支持 arXiv 查询语法
                   例如: "ti:attention" (标题), "au:hinton" (作者)
            max_results: 最大返回数量
            
        Returns:
            List[Paper]: 论文列表
        """
        params = {
            'search_query': query,
            'max_results': max_results,
            'sortBy': 'submittedDate',
            'sortOrder': 'descending'
        }
        
        try:
            response = requests.get(self.BASE_URL, params=params, timeout=30)
            response.raise_for_status()
        except requests.RequestException as e:
            logger.error(f"arXiv API request failed: {e}")
            return []
        
        feed = feedparser.parse(response.content)
        papers = []
        
        for entry in feed.entries:
            try:
                authors = [author.name for author in entry.authors]
                published = datetime.strptime(entry.published, '%Y-%m-%dT%H:%M:%SZ')
                updated = datetime.strptime(entry.updated, '%Y-%m-%dT%H:%M:%SZ')
                pdf_url = next(
                    (link.href for link in entry.links if link.type == 'application/pdf'), 
                    ''
                )
                papers.append(Paper(
                    paper_id=entry.id.split('/')[-1],
                    title=entry.title.replace('\n', ' ').strip(),
                    authors=authors,
                    abstract=entry.summary.replace('\n', ' ').strip(),
                    url=entry.id,
                    pdf_url=pdf_url,
                    published_date=published,
                    updated_date=updated,
                    source='arxiv',
                    categories=[tag.term for tag in entry.tags],
                    keywords=[],
                    doi=entry.get('doi', '')
                ))
            except Exception as e:
                logger.warning(f"Error parsing arXiv entry: {e}")
                
        return papers
  • Abstract base class PaperSource defining the interface/schema for all searchers, including ArxivSearcher with search method signature.
    class PaperSource:
        """Abstract base class for paper sources"""
        def search(self, query: str, **kwargs) -> List[Paper]:
            raise NotImplementedError
    
        def download_pdf(self, paper_id: str, save_path: str) -> str:
            raise NotImplementedError
    
        def read_paper(self, paper_id: str, save_path: str) -> str:
            raise NotImplementedError

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/h-lu/paper-search-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server