Skip to main content
Glama
jerpint

paperpal

by jerpint

semantic_search_papers_on_huggingface

Search HuggingFace papers using semantic queries to find relevant academic research based on meaning rather than keywords.

Instructions

Search for papers on HuggingFace using semantic search.

Args:
    query (str): The query term to search for. It will automatically determine if it should use keywords or a natural language query, so format your queries accordingly.
    top_n (int): The number of papers to return. Default is 10, but you can set it to any number.

Returns:
    str: A list of papers with the title, summary, ID, and upvotes.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
queryYes
top_nNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes

Implementation Reference

  • The main handler function for the 'semantic_search_papers_on_huggingface' tool. It is registered via the @mcp.tool() decorator, performs the semantic search using a helper function, formats the papers list, and returns it as a string.
    @mcp.tool()
    async def semantic_search_papers_on_huggingface(query: str, top_n: int = 10) -> str:
        """Search for papers on HuggingFace using semantic search.
    
        Args:
            query (str): The query term to search for. It will automatically determine if it should use keywords or a natural language query, so format your queries accordingly.
            top_n (int): The number of papers to return. Default is 10, but you can set it to any number.
    
        Returns:
            str: A list of papers with the title, summary, ID, and upvotes.
        """
        papers: list[HuggingFacePaper] = semantic_search_huggingface_papers(query, top_n)
    
        return stringify_papers(papers)
  • Helper utility to convert a list of paper objects (Arxiv or HuggingFace) into a formatted string output used by the tool.
    def stringify_papers(papers: list[ArxivPaper | HuggingFacePaper]) -> str:
        """Format a list of papers into a string."""
    
        papers_str = "\n---\n".join([str(paper) for paper in papers])
        return f"List of papers:\n---\n{papers_str}\n---\n"
  • Pydantic BaseModel schema defining the structure of HuggingFace paper data, used in the tool's output.
    class HuggingFacePaper(BaseModel):
        title: str
        summary: str
        arxiv_id: str
        upvotes: int
    
        def __str__(self) -> str:
            return f"Title: {self.title}\nSummary: {self.summary}\nID: {self.arxiv_id}\nUpvotes: {self.upvotes}"
  • Core helper function that queries the HuggingFace papers API with the given query, parses the top_n results into HuggingFacePaper models, handles errors.
    def semantic_search_huggingface_papers(query: str, top_n: int) -> list[HuggingFacePaper]:
        """Search for papers on HuggingFace."""
    
        url = f"https://huggingface.co/api/papers/search?q={query}"
    
        try:
            response = httpx.get(url)
            response.raise_for_status()
            papers_json = response.json()
            papers: list[HuggingFacePaper] = [parse_paper(paper) for paper in papers_json[:top_n]]
    
            return papers
    
        except Exception as e:
            return [f"Error fetching papers from HuggingFace. Try again later. {e}"]
  • Utility helper to parse raw JSON dict from HuggingFace API into a HuggingFacePaper model instance.
    def parse_paper(paper: dict) -> HuggingFacePaper:
        """Parse a paper from the HuggingFace API response."""
        return HuggingFacePaper(
            title=paper['paper']["title"],
            summary=paper['paper']["summary"],
            arxiv_id=paper['paper']["id"],
            upvotes=paper['paper']["upvotes"],
        )
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions that the query 'will automatically determine if it should use keywords or a natural language query,' which adds some context about the tool's behavior. However, it lacks details on rate limits, authentication needs, error handling, or what happens with invalid inputs, which are important for a search tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded, with the core purpose stated first. The Args and Returns sections are structured clearly, though the 'Returns' section could be more concise (e.g., listing fields without full sentences). Overall, it's efficient with minimal waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (2 parameters, no annotations, but with an output schema), the description is reasonably complete. It explains the parameters and return format, and the output schema likely covers the return structure in detail. However, it could benefit from more behavioral context (e.g., search scope, limitations) to be fully comprehensive.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds significant meaning beyond the input schema, which has 0% description coverage. It explains that 'query' can be keywords or natural language and will be automatically interpreted, and it specifies the default and flexibility for 'top_n'. This compensates well for the schema's lack of descriptions, though it doesn't cover all possible edge cases (e.g., query length limits).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Search for papers on HuggingFace using semantic search.' This specifies the verb (search), resource (papers on HuggingFace), and method (semantic search). However, it doesn't explicitly differentiate from the sibling tool 'fetch_paper_details_from_arxiv' (which appears to fetch details rather than search), so it doesn't reach the highest score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. While it mentions semantic search, it doesn't explain when to prefer this over keyword-based search or the sibling tool. There's no mention of prerequisites, limitations, or typical use cases, leaving the agent with minimal context for selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jerpint/paperpal'

If you have feedback or need assistance with the MCP directory API, please join our Discord server