Skip to main content
Glama
openags

Paper Search MCP

by openags

search_medrxiv

Search academic papers from medRxiv by medical category to find recent research in specific fields like infectious diseases or oncology.

Instructions

Search academic papers from medRxiv.

Note: medRxiv API filters by category name within the last 30 days, not full-text keyword search. Use a category keyword such as 'infectious_diseases', 'cardiovascular_medicine', 'oncology', etc.

Args: query: Category name to filter by (e.g., 'infectious_diseases', 'oncology'). max_results: Maximum number of papers to return (default: 10). Returns: List of paper metadata in dictionary format.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
queryYes
max_resultsNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes

Implementation Reference

  • Implementation of the MedRxivSearcher class which performs the actual API calls to search for papers on medRxiv.
    class MedRxivSearcher(PaperSource):
        """Searcher for medRxiv papers"""
        BASE_URL = "https://api.biorxiv.org/details/medrxiv"
    
        def __init__(self):
            self.session = requests.Session()
            self.session.proxies = {'http': None, 'https': None}
            self.timeout = 30
            self.max_retries = 3
    
        def search(self, query: str, max_results: int = 10, days: int = 30) -> List[Paper]:
            """
            Search for papers on medRxiv by category within the last N days.
    
            Args:
                query: Category name to search for (e.g., "cardiovascular medicine").
                max_results: Maximum number of papers to return.
                days: Number of days to look back for papers.
    
            Returns:
                List of Paper objects matching the category within the specified date range.
            """
            # Calculate date range: last N days
            end_date = datetime.now().strftime('%Y-%m-%d')
            start_date = (datetime.now() - timedelta(days=days)).strftime('%Y-%m-%d')
            
            # Format category: lowercase and replace spaces with underscores
            category = query.lower().replace(' ', '_')
            
            papers = []
            cursor = 0
            while len(papers) < max_results:
                url = f"{self.BASE_URL}/{start_date}/{end_date}/{cursor}"
                if category:
                    url += f"?category={category}"
    
                tries = 0
                while tries < self.max_retries:
                    try:
                        response = self.session.get(url, timeout=self.timeout)
                        response.raise_for_status()
                        data = response.json()
                        collection = data.get('collection', [])
                        for item in collection:
                            try:
                                date = datetime.strptime(item['date'], '%Y-%m-%d')
                                papers.append(Paper(
                                    paper_id=item['doi'],
                                    title=item['title'],
                                    authors=item['authors'].split('; '),
                                    abstract=item['abstract'],
                                    url=f"https://www.medrxiv.org/content/{item['doi']}v{item.get('version', '1')}",
                                    pdf_url=f"https://www.medrxiv.org/content/{item['doi']}v{item.get('version', '1')}.full.pdf",
                                    published_date=date,
                                    updated_date=date,
                                    source="medrxiv",
                                    categories=[item['category']],
                                    keywords=[],
                                    doi=item['doi']
                                ))
                            except Exception as e:
                                print(f"Error parsing medRxiv entry: {e}")
                        if len(collection) < 100:
                            break  # No more results
                        cursor += 100
                        break  # Exit retry loop on success
                    except requests.exceptions.RequestException as e:
                        tries += 1
                        if tries == self.max_retries:
                            print(f"Failed to connect to medRxiv API after {self.max_retries} attempts: {e}")
                            break
                        print(f"Attempt {tries} failed, retrying...")
                else:
                    continue
                break
    
            return papers[:max_results]
  • MCP tool handler `search_medrxiv` that wraps the MedRxivSearcher service.
    async def search_medrxiv(query: str, max_results: int = 10) -> List[Dict]:
        """Search academic papers from medRxiv.
    
        Note: medRxiv API filters by category name within the last 30 days, not full-text
        keyword search. Use a category keyword such as 'infectious_diseases',
        'cardiovascular_medicine', 'oncology', etc.
    
        Args:
            query: Category name to filter by (e.g., 'infectious_diseases', 'oncology').
            max_results: Maximum number of papers to return (default: 10).
        Returns:
            List of paper metadata in dictionary format.
        """
        papers = await async_search(medrxiv_searcher, query, max_results)
        return papers if papers else []
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes key behavioral traits: it's a search operation (implied read-only), specifies the API's filtering mechanism (category-based, not full-text), and mentions the time constraint (last 30 days). It doesn't cover rate limits, authentication needs, or pagination, but provides substantial operational context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded: the first sentence states the core purpose, followed by a critical note about API behavior, then clear parameter documentation. Every sentence adds value with zero waste. The bullet-like format for Args/Returns enhances readability without verbosity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (2 parameters, search operation), no annotations, but with an output schema present, the description is complete enough. It covers purpose, usage constraints, parameter semantics, and return format. The output schema handles return value details, so the description appropriately focuses on operational context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It adds significant meaning beyond the bare schema: explains that 'query' expects category names like 'infectious_diseases' (not arbitrary keywords), provides examples, and clarifies that 'max_results' has a default of 10. This transforms generic parameter names into actionable understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool searches academic papers from medRxiv, which is a specific verb+resource combination. However, it doesn't explicitly differentiate from sibling tools like 'search_biorxiv' or 'search_arxiv' beyond mentioning the medRxiv source. The purpose is clear but lacks sibling differentiation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool versus alternatives: it notes that the medRxiv API filters by category name within the last 30 days, not full-text keyword search. This clearly distinguishes it from tools that might offer different search capabilities. The guidance is specific and helpful for correct tool selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/openags/paper-search-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server