Skip to main content
Glama

search_papers

Search arXiv papers with advanced filtering by query, category, date, and author to find relevant academic research efficiently.

Instructions

Search for papers on arXiv with advanced filtering and query optimization.

QUERY CONSTRUCTION GUIDELINES:

  • Use QUOTED PHRASES for exact matches: "multi-agent systems", "neural networks", "machine learning"

  • Combine related concepts with OR: "AI agents" OR "software agents" OR "intelligent agents"

  • Use field-specific searches for precision:

    • ti:"exact title phrase" - search in titles only

    • au:"author name" - search by author

    • abs:"keyword" - search in abstracts only

  • Use ANDNOT to exclude unwanted results: "machine learning" ANDNOT "survey"

  • For best results, use 2-4 core concepts rather than long keyword lists

ADVANCED SEARCH PATTERNS:

  • Field + phrase: ti:"transformer architecture" for papers with exact title phrase

  • Multiple fields: au:"Smith" AND ti:"quantum" for author Smith's quantum papers

  • Exclusions: "deep learning" ANDNOT ("survey" OR "review") to exclude survey papers

  • Broad + narrow: "artificial intelligence" AND (robotics OR "computer vision")

CATEGORY FILTERING (highly recommended for relevance):

  • cs.AI: Artificial Intelligence

  • cs.MA: Multi-Agent Systems

  • cs.LG: Machine Learning

  • cs.CL: Computation and Language (NLP)

  • cs.CV: Computer Vision

  • cs.RO: Robotics

  • cs.HC: Human-Computer Interaction

  • cs.CR: Cryptography and Security

  • cs.DB: Databases

EXAMPLES OF EFFECTIVE QUERIES:

  • ti:"reinforcement learning" with categories: ["cs.LG", "cs.AI"] - for RL papers by title

  • au:"Hinton" AND "deep learning" with categories: ["cs.LG"] - for Hinton's deep learning work

  • "multi-agent" ANDNOT "survey" with categories: ["cs.MA"] - exclude survey papers

  • abs:"transformer" AND ti:"attention" with categories: ["cs.CL"] - attention papers with transformer abstracts

DATE FILTERING: Use YYYY-MM-DD format for historical research:

  • date_to: "2015-12-31" - for foundational/classic work (pre-2016)

  • date_from: "2020-01-01" - for recent developments (post-2020)

  • Both together for specific time periods

RESULT QUALITY: Results sorted by RELEVANCE (most relevant papers first), not just newest papers. This ensures you get the most pertinent results regardless of publication date.

TIPS FOR FOUNDATIONAL RESEARCH:

  • Use date_to: "2010-12-31" to find classic papers on BDI, SOAR, ACT-R

  • Combine with field searches: ti:"BDI" AND abs:"belief desire intention"

  • Try author searches: au:"Rao" AND "BDI" for Anand Rao's foundational BDI work

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
queryYesSearch query using quoted phrases for exact matches (e.g., '"machine learning" OR "deep learning"') or specific technical terms. Avoid overly broad or generic terms.
max_resultsNoMaximum number of results to return (default: 10, max: 50). Use 15-20 for comprehensive searches.
date_fromNoStart date for papers (YYYY-MM-DD format). Use to find recent work, e.g., '2023-01-01' for last 2 years.
date_toNoEnd date for papers (YYYY-MM-DD format). Use with date_from to find historical work, e.g., '2020-12-31' for older research.
categoriesNoStrongly recommended: arXiv categories to focus search (e.g., ['cs.AI', 'cs.MA'] for agent research, ['cs.LG'] for ML, ['cs.CL'] for NLP, ['cs.CV'] for vision). Greatly improves relevance.
sort_byNoSort results by 'relevance' (most relevant first, default) or 'date' (newest first). Use 'relevance' for focused searches, 'date' for recent developments.

Implementation Reference

  • The main handler function that executes the search_papers tool logic. It uses the arxiv Python library to query arXiv, applies filters for categories, dates, sorts results, processes paper metadata, and returns JSON-formatted list of papers.
    async def handle_search(arguments: Dict[str, Any]) -> List[types.TextContent]:
        """Handle paper search requests with improved arXiv API integration."""
        try:
            client = arxiv.Client()
            max_results = min(int(arguments.get("max_results", 10)), settings.MAX_RESULTS)
            base_query = arguments["query"]
    
            logger.debug(
                f"Starting search with query: '{base_query}', max_results: {max_results}"
            )
    
            # Build query components
            query_parts = []
    
            # Add base query with optimization
            if base_query.strip():
                optimized_query = _optimize_query(base_query)
                query_parts.append(f"({optimized_query})")
                if optimized_query != base_query:
                    logger.debug(f"Optimized query: '{base_query}' -> '{optimized_query}'")
    
            # Add category filtering
            if categories := arguments.get("categories"):
                if not _validate_categories(categories):
                    return [
                        types.TextContent(
                            type="text",
                            text="Error: Invalid category provided. Please check arXiv category names.",
                        )
                    ]
                category_filter = " OR ".join(f"cat:{cat}" for cat in categories)
                query_parts.append(f"({category_filter})")
                logger.debug(f"Added category filter: {category_filter}")
    
            # Add date filtering using arXiv API syntax
            # Temporarily disable server-side date filtering due to API issues
            # Will filter client-side for now
            date_from_arg = arguments.get("date_from")
            date_to_arg = arguments.get("date_to")
            if date_from_arg or date_to_arg:
                logger.debug(f"Date filtering requested: {date_from_arg} to {date_to_arg}")
                # We'll handle this client-side after getting results
    
            # Combine query parts
            if not query_parts:
                return [
                    types.TextContent(
                        type="text", text="Error: No search criteria provided"
                    )
                ]
    
            # Combine query parts - arXiv uses space for AND by default
            final_query = " ".join(query_parts)
            logger.debug(f"Final arXiv query: {final_query}")
    
            # Increase max_results slightly to account for any edge cases
            # but cap it to avoid overwhelming the API
            api_max_results = min(max_results + 5, settings.MAX_RESULTS)
    
            # Determine sort method
            sort_by_arg = arguments.get("sort_by", "relevance")
            if sort_by_arg == "date":
                sort_criterion = arxiv.SortCriterion.SubmittedDate
                logger.debug("Using date sorting (newest first)")
            else:
                sort_criterion = arxiv.SortCriterion.Relevance
                logger.debug("Using relevance sorting (most relevant first)")
    
            search = arxiv.Search(
                query=final_query,
                max_results=api_max_results,
                sort_by=sort_criterion,
            )
    
            # Process results with client-side date filtering
            results = []
            result_count = 0
    
            # Parse date filters if provided
            date_from_parsed = None
            date_to_parsed = None
            if date_from_arg:
                try:
                    date_from_parsed = parser.parse(date_from_arg).replace(
                        tzinfo=timezone.utc
                    )
                except (ValueError, TypeError) as e:
                    return [
                        types.TextContent(
                            type="text", text=f"Error: Invalid date_from format - {str(e)}"
                        )
                    ]
    
            if date_to_arg:
                try:
                    date_to_parsed = parser.parse(date_to_arg).replace(tzinfo=timezone.utc)
                except (ValueError, TypeError) as e:
                    return [
                        types.TextContent(
                            type="text", text=f"Error: Invalid date_to format - {str(e)}"
                        )
                    ]
    
            for paper in client.results(search):
                if result_count >= max_results:
                    break
    
                # Apply client-side date filtering
                paper_date = paper.published
                if not paper_date.tzinfo:
                    paper_date = paper_date.replace(tzinfo=timezone.utc)
    
                if date_from_parsed and paper_date < date_from_parsed:
                    continue
                if date_to_parsed and paper_date > date_to_parsed:
                    continue
    
                results.append(_process_paper(paper))
                result_count += 1
    
            logger.info(f"Search completed: {len(results)} results returned")
            response_data = {"total_results": len(results), "papers": results}
    
            return [
                types.TextContent(type="text", text=json.dumps(response_data, indent=2))
            ]
    
        except arxiv.ArxivError as e:
            logger.error(f"ArXiv API error: {e}")
            return [
                types.TextContent(type="text", text=f"Error: ArXiv API error - {str(e)}")
            ]
        except Exception as e:
            logger.error(f"Unexpected search error: {e}")
            return [types.TextContent(type="text", text=f"Error: {str(e)}")]
  • Defines the Tool object for search_papers including detailed description and input schema with parameters for query, max_results, date filters, categories, and sort_by.
    search_tool = types.Tool(
        name="search_papers",
        description="""Search for papers on arXiv with advanced filtering and query optimization.
    
    QUERY CONSTRUCTION GUIDELINES:
    - Use QUOTED PHRASES for exact matches: "multi-agent systems", "neural networks", "machine learning"
    - Combine related concepts with OR: "AI agents" OR "software agents" OR "intelligent agents"  
    - Use field-specific searches for precision:
      - ti:"exact title phrase" - search in titles only
      - au:"author name" - search by author
      - abs:"keyword" - search in abstracts only
    - Use ANDNOT to exclude unwanted results: "machine learning" ANDNOT "survey"
    - For best results, use 2-4 core concepts rather than long keyword lists
    
    ADVANCED SEARCH PATTERNS:
    - Field + phrase: ti:"transformer architecture" for papers with exact title phrase
    - Multiple fields: au:"Smith" AND ti:"quantum" for author Smith's quantum papers  
    - Exclusions: "deep learning" ANDNOT ("survey" OR "review") to exclude survey papers
    - Broad + narrow: "artificial intelligence" AND (robotics OR "computer vision")
    
    CATEGORY FILTERING (highly recommended for relevance):
    - cs.AI: Artificial Intelligence
    - cs.MA: Multi-Agent Systems  
    - cs.LG: Machine Learning
    - cs.CL: Computation and Language (NLP)
    - cs.CV: Computer Vision
    - cs.RO: Robotics
    - cs.HC: Human-Computer Interaction
    - cs.CR: Cryptography and Security
    - cs.DB: Databases
    
    EXAMPLES OF EFFECTIVE QUERIES:
    - ti:"reinforcement learning" with categories: ["cs.LG", "cs.AI"] - for RL papers by title
    - au:"Hinton" AND "deep learning" with categories: ["cs.LG"] - for Hinton's deep learning work
    - "multi-agent" ANDNOT "survey" with categories: ["cs.MA"] - exclude survey papers
    - abs:"transformer" AND ti:"attention" with categories: ["cs.CL"] - attention papers with transformer abstracts
    
    DATE FILTERING: Use YYYY-MM-DD format for historical research:
    - date_to: "2015-12-31" - for foundational/classic work (pre-2016)
    - date_from: "2020-01-01" - for recent developments (post-2020)
    - Both together for specific time periods
    
    RESULT QUALITY: Results sorted by RELEVANCE (most relevant papers first), not just newest papers.
    This ensures you get the most pertinent results regardless of publication date.
    
    TIPS FOR FOUNDATIONAL RESEARCH:
    - Use date_to: "2010-12-31" to find classic papers on BDI, SOAR, ACT-R
    - Combine with field searches: ti:"BDI" AND abs:"belief desire intention"  
    - Try author searches: au:"Rao" AND "BDI" for Anand Rao's foundational BDI work""",
        inputSchema={
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": 'Search query using quoted phrases for exact matches (e.g., \'"machine learning" OR "deep learning"\') or specific technical terms. Avoid overly broad or generic terms.',
                },
                "max_results": {
                    "type": "integer",
                    "description": "Maximum number of results to return (default: 10, max: 50). Use 15-20 for comprehensive searches.",
                },
                "date_from": {
                    "type": "string",
                    "description": "Start date for papers (YYYY-MM-DD format). Use to find recent work, e.g., '2023-01-01' for last 2 years.",
                },
                "date_to": {
                    "type": "string",
                    "description": "End date for papers (YYYY-MM-DD format). Use with date_from to find historical work, e.g., '2020-12-31' for older research.",
                },
                "categories": {
                    "type": "array",
                    "items": {"type": "string"},
                    "description": "Strongly recommended: arXiv categories to focus search (e.g., ['cs.AI', 'cs.MA'] for agent research, ['cs.LG'] for ML, ['cs.CL'] for NLP, ['cs.CV'] for vision). Greatly improves relevance.",
                },
                "sort_by": {
                    "type": "string",
                    "enum": ["relevance", "date"],
                    "description": "Sort results by 'relevance' (most relevant first, default) or 'date' (newest first). Use 'relevance' for focused searches, 'date' for recent developments.",
                },
            },
            "required": ["query"],
        },
    )
  • Registers the search_papers tool (as search_tool) in the MCP server's list_tools method, making it discoverable.
    @server.list_tools()
    async def list_tools() -> List[types.Tool]:
        """List available arXiv research tools."""
        return [search_tool, download_tool, list_tool, read_tool]
  • Dispatches calls to the search_papers tool by invoking the handle_search handler function within the server's call_tool method.
    if name == "search_papers":
        return await handle_search(arguments)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes key behaviors: result sorting (by relevance, not just date), query optimization techniques, and the impact of parameters like categories on relevance. However, it lacks details on rate limits, error handling, or authentication needs, which are common for API tools. The description doesn't contradict any annotations (none exist), so no contradiction flag is triggered.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose, but it's lengthy with multiple sections (guidelines, patterns, filtering, examples, tips). While each section adds value, it could be more concise by integrating some details (e.g., merging examples into guidelines). The structure is logical but verbose, making it less efficient for quick scanning by an AI agent compared to a tighter presentation.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 parameters, no output schema, no annotations), the description is largely complete: it covers purpose, usage, parameters, and behavioral traits. However, it lacks output details (e.g., result format, pagination) and doesn't address potential errors or limits, leaving some gaps. The rich parameter explanations compensate partially, but for a search tool with no output schema, more on return values would enhance completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the baseline is 3. The description adds significant value beyond the schema by explaining parameter semantics in depth: it provides query construction guidelines with examples, clarifies date filtering formats and use cases, details category options with relevance impacts, and explains sort_by implications ('relevance' vs. 'date'). This goes well beyond the schema's basic descriptions, though it doesn't cover all parameters equally (e.g., max_results gets less attention).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description immediately states 'Search for papers on arXiv with advanced filtering and query optimization,' which clearly specifies the verb (search), resource (papers on arXiv), and scope (advanced filtering/optimization). It distinguishes from sibling tools like 'list_papers' (likely simpler listing) and 'download_paper'/'read_paper' (post-search actions), making the purpose specific and differentiated.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool vs. alternatives through detailed query construction guidelines, category filtering recommendations, and examples. It implicitly positions this as the primary search tool for arXiv papers with advanced capabilities, contrasting with simpler sibling tools like 'list_papers' that likely lack such filtering. The tips for foundational research further clarify usage contexts.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/blazickjp/arxiv-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server