Skip to main content
Glama
lingqukan

arXiv MCP Server

by lingqukan

query_papers

Search and filter arXiv research papers by date, category, title, or specific IDs to retrieve selected metadata fields from a local database.

Instructions

Query papers from the local database with flexible filtering and field selection.

All filter parameters are combined with AND logic. Within categories, OR logic is used.
If no filter parameters are provided, returns the most recent papers up to max_results.

Args:
    date: Filter by publication date in YYYY-MM-DD format (e.g. "2026-03-18")
    categories: Filter by one or more arXiv categories (OR logic), e.g. ["cs.AI", "cs.LG"]
    title: Filter by title keyword (title field only, not abstract; case-insensitive for ASCII)
    entry_ids: Fetch specific papers by their arXiv entry IDs. Typically used alone;
               combining with other filters applies AND logic and may return fewer results
               than expected if the other conditions do not match.
    fields: Fields to return. Valid: entry_id, title, authors, abstract, url, published, updated, categories.
            Defaults to: entry_id, title, authors, published, url
    max_results: Maximum number of results to return (default: 500)

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
dateNo
categoriesNo
titleNo
entry_idsNo
fieldsNo
max_resultsNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes

Implementation Reference

  • The MCP tool handler for "query_papers" which defines the tool's interface and performs input validation.
    def query_papers(
        date: Optional[str] = None,
        categories: Optional[List[str]] = None,
        title: Optional[str] = None,
        entry_ids: Optional[List[str]] = None,
        fields: Optional[List[str]] = None,
        max_results: int = 500,
    ) -> str:
        """Query papers from the local database with flexible filtering and field selection.
    
        All filter parameters are combined with AND logic. Within categories, OR logic is used.
        If no filter parameters are provided, returns the most recent papers up to max_results.
    
        Args:
            date: Filter by publication date in YYYY-MM-DD format (e.g. "2026-03-18")
            categories: Filter by one or more arXiv categories (OR logic), e.g. ["cs.AI", "cs.LG"]
            title: Filter by title keyword (title field only, not abstract; case-insensitive for ASCII)
            entry_ids: Fetch specific papers by their arXiv entry IDs. Typically used alone;
                       combining with other filters applies AND logic and may return fewer results
                       than expected if the other conditions do not match.
            fields: Fields to return. Valid: entry_id, title, authors, abstract, url, published, updated, categories.
                    Defaults to: entry_id, title, authors, published, url
            max_results: Maximum number of results to return (default: 500)
        """
        active_fields = fields if fields is not None else DEFAULT_FIELDS
    
        invalid = [f for f in active_fields if f not in VALID_FIELDS]
        if invalid:
            return json.dumps(
                {"error": f"Invalid field(s): {invalid}. Valid fields: {sorted(VALID_FIELDS)}"},
                ensure_ascii=False,
            )
    
        date_re = re.compile(r"^\d{4}-\d{2}-\d{2}$")
        if date and not date_re.match(date):
            return json.dumps(
                {"error": f"Invalid date format: {date!r}. Expected YYYY-MM-DD."},
                ensure_ascii=False,
                indent=2,
            )
    
        logger.info(
            f"Querying papers: date={date!r}, categories={categories}, "
            f"title={title!r}, entry_ids={entry_ids}, fields={active_fields}, max={max_results}"
        )
        db = _get_db()
        papers = db.query_papers(
            date=date,
            categories=categories,
            title=title,
            entry_ids=entry_ids,
            max_results=max_results,
        )
        return json.dumps(
            {
                "total": len(papers),
                "papers": [_build_paper_dict(p, active_fields) for p in papers],
            },
            ensure_ascii=False,
            indent=2,
        )
  • The underlying database logic that performs the SQL query for papers.
    def query_papers(
        self,
        date: Optional[str] = None,
        categories: Optional[List[str]] = None,
        title: Optional[str] = None,
        entry_ids: Optional[List[str]] = None,
        max_results: int = 500,
    ) -> List[ArxivPaper]:
        conditions = []
        params: List[Any] = []
    
        if date:
            conditions.append("DATE(published) = ?")
            params.append(date)
    
        if categories:
            cat_clauses = ["categories LIKE ? ESCAPE '\\'" for _ in categories]
            conditions.append("(" + " OR ".join(cat_clauses) + ")")
            params.extend(f'%"{self._escape_like(c)}"%' for c in categories)
    
        if title:
            conditions.append("title LIKE ? ESCAPE '\\'")
            params.append(f"%{self._escape_like(title)}%")
    
        if entry_ids:
            placeholders = ",".join("?" for _ in entry_ids)
            conditions.append(f"entry_id IN ({placeholders})")
            params.extend(entry_ids)
    
        where = ("WHERE " + " AND ".join(conditions)) if conditions else ""
        sql = f"""
            SELECT entry_id, updated, published, title, summary,
                   authors, categories, viewed
            FROM papers
            {where}
            ORDER BY published DESC
            LIMIT ?
        """
        params.append(max_results)
    
        with sqlite3.connect(self.database_path) as conn:
            cursor = conn.cursor()
            cursor.execute(sql, params)
            return [self.convert_to_paper(row) for row in cursor.fetchall()]
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Excellent disclosure of query semantics: explains default behavior when no filters provided (returns most recent), documents the logical operators used, and details parameter interaction effects. As annotations are absent, this carries the full behavioral burden well, though explicit 'read-only' statement is absent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Structure is exemplary: single-sentence purpose statement upfront, followed by logic explanation, then organized Args block. No filler text; every line conveys specific constraints or behaviors. Docstring-style formatting is readable and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Fully complete for a query tool of this complexity. All 6 optional parameters are documented, filter logic is explained, and since an output schema exists (per context signals), return values need not be described in narrative.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the Args section compensates perfectly by providing detailed semantics for all 6 parameters: date format with example, category logic with examples, field selection with valid values and defaults, and specific usage notes for entry_ids.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Opens with specific verb ('Query'), resource ('papers'), and scope ('local database'), clearly distinguishing from siblings like 'fetch_papers' (external) and 'count_papers_on_date' (aggregation). The phrase 'flexible filtering and field selection' further clarifies capabilities.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides clear behavioral guidance on filter logic (AND between parameters, OR within categories) and warns that 'entry_ids' is typically used alone with caveats about combining filters. Lacks explicit comparison to sibling tools like 'fetch_papers'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/lingqukan/arxiv-today-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server