Skip to main content
Glama
huanongfish

ArXiv MCP Server

by huanongfish

search_papers

Search arXiv research papers using queries, filters by date and categories, and retrieves results for academic research.

Instructions

Search for papers on arXiv with advanced filtering

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
queryYes
max_resultsNo
date_fromNo
date_toNo
categoriesNo

Implementation Reference

  • The handler function that executes the search_papers tool. It searches arXiv using the arxiv library, applies filters for dates and categories, processes results, and returns JSON-formatted paper information.
    async def handle_search(arguments: Dict[str, Any]) -> List[types.TextContent]:
        """Handle paper search requests."""
        try:
            client = arxiv.Client()
            max_results = min(int(arguments.get("max_results", 10)), settings.MAX_RESULTS)
    
            # Build search query with category filtering
            query = arguments["query"]
            if categories := arguments.get("categories"):
                category_filter = " OR ".join(f"cat:{cat}" for cat in categories)
                query = f"({query}) AND ({category_filter})"
    
            search = arxiv.Search(
                query=query,
                max_results=max_results,
                sort_by=arxiv.SortCriterion.SubmittedDate,
            )
    
            # Process results with date filtering
            results = []
            try:
                date_from = (
                    parser.parse(arguments["date_from"]).replace(tzinfo=timezone.utc)
                    if "date_from" in arguments
                    else None
                )
                date_to = (
                    parser.parse(arguments["date_to"]).replace(tzinfo=timezone.utc)
                    if "date_to" in arguments
                    else None
                )
            except (ValueError, TypeError) as e:
                return [
                    types.TextContent(
                        type="text", text=f"Error: Invalid date format - {str(e)}"
                    )
                ]
    
            for paper in client.results(search):
                if _is_within_date_range(paper.published, date_from, date_to):
                    results.append(_process_paper(paper))
    
                if len(results) >= max_results:
                    break
    
            response_data = {"total_results": len(results), "papers": results}
    
            return [
                types.TextContent(type="text", text=json.dumps(response_data, indent=2))
            ]
    
        except Exception as e:
            return [types.TextContent(type="text", text=f"Error: {str(e)}")]
  • Defines the Tool object for 'search_papers' including name, description, and input schema for parameters like query, max_results, date_from, date_to, categories.
    search_tool = types.Tool(
        name="search_papers",
        description="Search for papers on arXiv with advanced filtering",
        inputSchema={
            "type": "object",
            "properties": {
                "query": {"type": "string"},
                "max_results": {"type": "integer"},
                "date_from": {"type": "string"},
                "date_to": {"type": "string"},
                "categories": {"type": "array", "items": {"type": "string"}},
            },
            "required": ["query"],
        },
    )
  • Registers the search_tool (and others) by returning it in the list_tools() method, making it discoverable by MCP clients.
    @server.list_tools()
    async def list_tools() -> List[types.Tool]:
        """List available arXiv research tools."""
        return [search_tool, download_tool, list_tool, read_tool]
  • The call_tool handler dispatches to handle_search when name is 'search_papers', effectively registering the handler implementation.
    @server.call_tool()
    async def call_tool(name: str, arguments: Dict[str, Any]) -> List[types.TextContent]:
        """Handle tool calls for arXiv research functionality."""
        logger.debug(f"Calling tool {name} with arguments {arguments}")
        try:
            if name == "search_papers":
                return await handle_search(arguments)
            elif name == "download_paper":
                return await handle_download(arguments)
            elif name == "list_papers":
                return await handle_list_papers(arguments)
            elif name == "read_paper":
                return await handle_read_paper(arguments)
            else:
                return [types.TextContent(type="text", text=f"Error: Unknown tool {name}")]
        except Exception as e:
            logger.error(f"Tool error: {str(e)}")
            return [types.TextContent(type="text", text=f"Error: {str(e)}")]
  • Helper function to process arXiv paper result into a standardized dictionary with fields like id, title, authors, etc., including resource URI.
    def _process_paper(paper: arxiv.Result) -> Dict[str, Any]:
        """Process paper information with resource URI."""
        return {
            "id": paper.get_short_id(),
            "title": paper.title,
            "authors": [author.name for author in paper.authors],
            "abstract": paper.summary,
            "categories": paper.categories,
            "published": paper.published.isoformat(),
            "url": paper.pdf_url,
            "resource_uri": f"arxiv://{paper.get_short_id()}",
        }
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions 'advanced filtering' but fails to detail key traits such as rate limits, authentication needs, pagination behavior, or what 'advanced' entails. This is inadequate for a search tool with 5 parameters and no output schema.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose. It avoids redundancy and waste, though it could be more structured by briefly outlining filter types. Overall, it earns its place without verbosity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (5 parameters, 0% schema coverage, no output schema, no annotations), the description is insufficient. It lacks details on return values, error handling, or practical usage examples, leaving significant gaps for the agent to operate effectively in a context with sibling tools.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate for undocumented parameters. It only vaguely references 'advanced filtering' without explaining what parameters like 'categories' or date ranges entail, how 'query' is interpreted, or default behaviors for 'max_results'. This adds minimal value beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Search for papers') and resource ('on arXiv'), specifying 'with advanced filtering' to indicate capability beyond basic search. It distinguishes from siblings like 'list_papers' by emphasizing search functionality, though it doesn't explicitly contrast with 'download_paper' or 'read_paper'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives like 'list_papers' or 'download_paper'. The description implies usage for filtered searches but lacks explicit context, prerequisites, or exclusions, leaving the agent to infer based on tool names alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/huanongfish/arxiv-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server