Skip to main content
Glama
nanyang12138

AI Research MCP Server

by nanyang12138

search_latest_papers

Find recent AI/ML research papers by searching arXiv, Papers with Code, and Hugging Face using keywords and date filters.

Instructions

Search for latest AI/ML research papers from multiple sources (arXiv, Papers with Code, Hugging Face)

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
keywordsNoKeywords to search for (e.g., ['LLM', 'multimodal'])
daysNoNumber of days to look back (1-30)
sourcesNoData sources to search (default: all)
max_resultsNoMaximum number of results per source

Implementation Reference

  • The primary handler function that executes the search_latest_papers tool. It queries arXiv, PapersWithCode, and HuggingFace based on input parameters, uses caching, aggregates results, and formats them using _format_papers.
    async def _search_latest_papers(
        self,
        keywords: Optional[List[str]] = None,
        days: int = 7,
        sources: Optional[List[str]] = None,
        max_results: int = 20,
    ) -> str:
        """Search for latest papers."""
        if sources is None:
            sources = ["arxiv", "papers_with_code", "huggingface"]
        
        all_papers = []
        
        # Search arXiv
        if "arxiv" in sources:
            cache_key = f"arxiv_{keywords}_{days}"
            cached = self.cache.get(cache_key, self.cache_expiry["arxiv"])
            if cached:
                all_papers.extend(cached)
            else:
                papers = await asyncio.to_thread(
                    self.arxiv.search_papers,
                    keywords=keywords,
                    days=days,
                    max_results=max_results,
                )
                self.cache.set(cache_key, papers)
                all_papers.extend(papers)
        
        # Search Papers with Code
        if "papers_with_code" in sources:
            cache_key = f"pwc_{keywords}_{days}"
            cached = self.cache.get(cache_key, self.cache_expiry["arxiv"])
            if cached:
                all_papers.extend(cached)
            else:
                papers = await asyncio.to_thread(
                    self.papers_with_code.get_latest_papers,
                    days=days,
                    items_per_page=max_results,
                )
                self.cache.set(cache_key, papers)
                all_papers.extend(papers)
        
        # Get Hugging Face daily papers
        if "huggingface" in sources:
            cache_key = f"hf_daily_{days}"
            cached = self.cache.get(cache_key, self.cache_expiry["arxiv"])
            if cached:
                all_papers.extend(cached)
            else:
                papers = await asyncio.to_thread(
                    self.huggingface.get_daily_papers,
                    days=min(days, 7),
                )
                self.cache.set(cache_key, papers)
                all_papers.extend(papers)
        
        # Format results
        return self._format_papers(all_papers, keywords)
  • Registers the search_latest_papers tool in the MCP server's list_tools handler, defining its name, description, and input schema.
    Tool(
        name="search_latest_papers",
        description="Search for latest AI/ML research papers from multiple sources (arXiv, Papers with Code, Hugging Face)",
        inputSchema={
            "type": "object",
            "properties": {
                "keywords": {
                    "type": "array",
                    "items": {"type": "string"},
                    "description": "Keywords to search for (e.g., ['LLM', 'multimodal'])",
                },
                "days": {
                    "type": "integer",
                    "description": "Number of days to look back (1-30)",
                    "default": 7,
                },
                "sources": {
                    "type": "array",
                    "items": {
                        "type": "string",
                        "enum": ["arxiv", "papers_with_code", "huggingface"],
                    },
                    "description": "Data sources to search (default: all)",
                },
                "max_results": {
                    "type": "integer",
                    "description": "Maximum number of results per source",
                    "default": 20,
                },
            },
        },
    ),
  • Defines the JSON schema for input validation of the search_latest_papers tool parameters.
    inputSchema={
        "type": "object",
        "properties": {
            "keywords": {
                "type": "array",
                "items": {"type": "string"},
                "description": "Keywords to search for (e.g., ['LLM', 'multimodal'])",
            },
            "days": {
                "type": "integer",
                "description": "Number of days to look back (1-30)",
                "default": 7,
            },
            "sources": {
                "type": "array",
                "items": {
                    "type": "string",
                    "enum": ["arxiv", "papers_with_code", "huggingface"],
                },
                "description": "Data sources to search (default: all)",
            },
            "max_results": {
                "type": "integer",
                "description": "Maximum number of results per source",
                "default": 20,
            },
        },
    },
  • Tool dispatch logic in the generic call_tool handler that routes calls to search_latest_papers to the specific implementation.
    if name == "search_latest_papers":
        result = await self._search_latest_papers(**arguments)
    elif name == "search_github_repos":
  • Helper function called by the handler to format the aggregated papers list into a markdown string with deduplication, sorting by date, and rich details.
    def _format_papers(self, papers: List[Dict], keywords: Optional[List[str]] = None) -> str:
        """Format papers as markdown."""
        if not papers:
            return "*No papers found.*"
        
        # Deduplicate by title
        seen_titles = set()
        unique_papers = []
        for paper in papers:
            title = paper.get("title", "").lower()
            if title and title not in seen_titles:
                seen_titles.add(title)
                unique_papers.append(paper)
        
        # Sort by date (most recent first)
        def get_date(paper):
            date_str = paper.get("published") or paper.get("updated") or ""
            try:
                dt = datetime.fromisoformat(date_str.replace("Z", "+00:00"))
                # Ensure timezone-aware
                if dt.tzinfo is None:
                    dt = dt.replace(tzinfo=timezone.utc)
                return dt
            except:
                return datetime.min.replace(tzinfo=timezone.utc)
        
        unique_papers.sort(key=get_date, reverse=True)
        
        lines = []
        for i, paper in enumerate(unique_papers, 1):
            title = paper.get("title", "Untitled")
            authors = paper.get("authors", [])
            author_str = authors[0] if authors else "Unknown"
            if len(authors) > 1:
                author_str += " et al."
            
            url = paper.get("url", "")
            source = paper.get("source", "unknown")
            published = paper.get("published", "")[:10]  # Just the date
            
            lines.append(f"### {i}. [{title}]({url})")
            lines.append(f"*{author_str} • {published} • {source}*")
            
            # Add summary/abstract (truncated)
            summary = paper.get("summary") or paper.get("abstract", "")
            if summary:
                summary = summary.replace("\n", " ")[:200] + "..."
                lines.append(f"\n{summary}")
            
            # Add GitHub link if available
            github_url = paper.get("github_url")
            if github_url:
                stars = paper.get("stars", 0)
                lines.append(f"\nšŸ’» [Code]({github_url}) {'⭐ ' + str(stars) if stars > 0 else ''}")
            
            lines.append("")  # Empty line between papers
        
        return "\n".join(lines)
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions the sources but doesn't describe key behaviors such as rate limits, authentication needs, pagination, error handling, or the format of returned results. For a search tool with multiple parameters and no output schema, this leaves significant gaps in understanding how the tool operates.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that directly states the tool's purpose and scope without any redundant information. It's front-loaded with the core functionality and specifies the sources concisely, making it easy for an agent to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (4 parameters, no annotations, no output schema), the description is incomplete. It lacks information on behavioral traits, output format, and usage context relative to siblings. While concise, it doesn't provide enough detail for an agent to fully understand how to invoke and interpret results from this tool effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the input schema fully documents all four parameters (keywords, days, sources, max_results) with descriptions and defaults. The description adds no additional parameter semantics beyond what's in the schema, meeting the baseline for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('search for latest AI/ML research papers') and resources ('from multiple sources'), specifying the domains (arXiv, Papers with Code, Hugging Face). However, it doesn't explicitly differentiate from sibling tools like 'search_by_area' or 'get_daily_papers', which might have overlapping functionality.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives like 'search_by_area' or 'get_daily_papers'. It mentions the sources but doesn't explain why one would choose this tool over others, leaving the agent to infer usage context from the tool name alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/nanyang12138/AI-Research-MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server