Skip to main content
Glama
fegizii

Semantic Scholar MCP Server

by fegizii

search_papers

Find academic papers using Semantic Scholar's database with filters for citations, publication types, open access, and year ranges.

Instructions

Search for academic papers using Semantic Scholar.

Args:
    query: Search query string
    limit: Maximum number of results (default: 10, max: 100)
    offset: Number of results to skip (default: 0)
    fields: Comma-separated list of fields to return
    publication_types: Filter by publication types
    open_access_pdf: Filter for papers with open access PDFs
    min_citation_count: Minimum citation count
    year: Publication year or year range (e.g., "2020-2023")
    venue: Publication venue

Returns:
    Formatted search results

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
queryYes
limitNo
offsetNo
fieldsNo
publication_typesNo
open_access_pdfNo
min_citation_countNo
yearNo
venueNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes

Implementation Reference

  • The @mcp.tool() decorator registers the search_papers function with the FastMCP server.
    @mcp.tool()
  • Core implementation of the search_papers tool: constructs API parameters, calls make_api_request to Semantic Scholar's paper/search endpoint, formats results with format_paper, and returns a formatted string of paper information.
    async def search_papers(
        query: str,
        limit: int = 10,
        offset: int = 0,
        fields: Optional[str] = None,
        publication_types: Optional[str] = None,
        open_access_pdf: Optional[bool] = None,
        min_citation_count: Optional[int] = None,
        year: Optional[str] = None,
        venue: Optional[str] = None,
    ) -> str:
        """
        Search for academic papers using Semantic Scholar.
    
        Args:
            query: Search query string
            limit: Maximum number of results (default: 10, max: 100)
            offset: Number of results to skip (default: 0)
            fields: Comma-separated list of fields to return
            publication_types: Filter by publication types
            open_access_pdf: Filter for papers with open access PDFs
            min_citation_count: Minimum citation count
            year: Publication year or year range (e.g., "2020-2023")
            venue: Publication venue
    
        Returns:
            Formatted search results
        """
        params = {"query": query, "limit": min(limit, 100), "offset": offset}
    
        if fields:
            params["fields"] = fields
        else:
            params["fields"] = "paperId,title,authors,year,venue,citationCount,abstract"
    
        if publication_types:
            params["publicationTypes"] = publication_types
        if open_access_pdf is not None:
            params["openAccessPdf"] = str(open_access_pdf).lower()
        if min_citation_count is not None:
            params["minCitationCount"] = min_citation_count
        if year:
            params["year"] = year
        if venue:
            params["venue"] = venue
    
        result = await make_api_request("paper/search", params)
    
        if result is None:
            return "Error: Failed to fetch results"
    
        if "error" in result:
            return f"Error: {result['error']}"
    
        papers = result.get("data", [])
        total = result.get("total", 0)
    
        if not papers:
            return "No papers found matching your query."
    
        formatted_papers = []
        for i, paper in enumerate(papers, 1):
            formatted_papers.append(f"{i}. {format_paper(paper)}")
    
        result_text = f"Found {total} total papers (showing {len(papers)}):\n\n"
        result_text += "\n\n".join(formatted_papers)
    
        return result_text
  • The docstring and function signature provide the input schema (parameters with types, defaults, descriptions) and output description for the tool, used by FastMCP for JSON schema generation.
    """
    Search for academic papers using Semantic Scholar.
    
    Args:
        query: Search query string
        limit: Maximum number of results (default: 10, max: 100)
        offset: Number of results to skip (default: 0)
        fields: Comma-separated list of fields to return
        publication_types: Filter by publication types
        open_access_pdf: Filter for papers with open access PDFs
        min_citation_count: Minimum citation count
        year: Publication year or year range (e.g., "2020-2023")
        venue: Publication venue
    
    Returns:
        Formatted search results
    """
  • Helper function used by search_papers to make HTTP requests to the Semantic Scholar API, handling authentication, errors, and rate limits.
    async def make_api_request(
        endpoint: str, params: Optional[Dict[str, Any]] = None, method: str = "GET"
    ) -> Optional[Dict[str, Any]]:
        """Make a request to the Semantic Scholar API."""
        url = f"{BASE_URL}/{endpoint.lstrip('/')}"
    
        headers = {
            "Accept": "application/json",
            "User-Agent": f"semantic-scholar-mcp/{USER_AGENT_VERSION}",
        }
    
        if API_KEY:
            headers["x-api-key"] = API_KEY
    
        try:
            async with httpx.AsyncClient(timeout=API_TIMEOUT) as client:
                if method == "GET":
                    response = await client.get(url, headers=headers, params=params)
                elif method == "POST":
                    response = await client.post(url, headers=headers, json=params)
                else:
                    raise ValueError(f"Unsupported HTTP method: {method}")
    
                response.raise_for_status()
                return response.json()
    
        except httpx.HTTPStatusError as e:
            if e.response.status_code == 403:
                if not API_KEY:
                    return {
                        "error": "Rate limit exceeded. The shared public rate limit (1000 req/sec) may be exceeded. Get a free API key from https://www.semanticscholar.org/product/api for dedicated limits."
                    }
                else:
                    return {
                        "error": f"API key may be invalid or rate limit exceeded: {str(e)}"
                    }
            elif e.response.status_code == 429:
                return {
                    "error": "Rate limit exceeded. Please wait a moment and try again, or get an API key for dedicated higher limits."
                }
            else:
                return {"error": f"HTTP error: {str(e)}"}
        except httpx.HTTPError as e:
            return {"error": f"HTTP error: {str(e)}"}
        except Exception as e:
            return {"error": f"Request failed: {str(e)}"}
  • Helper function used by search_papers to format each paper result into a readable multi-line string.
    def format_paper(paper: Dict[str, Any]) -> str:
        """Format a paper for display."""
        title = paper.get("title", "Unknown Title")
        authors = paper.get("authors", [])
        author_names = [author.get("name", "Unknown") for author in authors[:3]]
        author_str = ", ".join(author_names)
        if len(authors) > 3:
            author_str += f" (and {len(authors) - 3} others)"
    
        year = paper.get("year")
        year_str = f" ({year})" if year else ""
    
        venue = paper.get("venue", "")
        venue_str = f" - {venue}" if venue else ""
    
        citation_count = paper.get("citationCount", 0)
    
        paper_id = paper.get("paperId", "")
    
        return f"Title: {title}\nAuthors: {author_str}{year_str}{venue_str}\nCitations: {citation_count}\nPaper ID: {paper_id}"
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions that results are 'formatted' but doesn't describe the format, pagination behavior, rate limits, authentication requirements, or error conditions. For a search tool with 9 parameters and no annotation coverage, this leaves significant behavioral aspects undocumented.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and appropriately sized. It begins with a clear purpose statement, then provides a comprehensive parameter list with helpful details, and ends with return information. Every sentence serves a clear purpose with no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (9 parameters, no annotations, but with output schema), the description is reasonably complete. It thoroughly documents all parameters and their semantics. The presence of an output schema means the description doesn't need to detail return values. However, it lacks behavioral context like rate limits or error handling that would be helpful for a search tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description provides clear semantic explanations for all 9 parameters beyond what the schema offers (which has 0% description coverage). It explains what each parameter does, provides examples (e.g., year format '2020-2023'), and includes default values and constraints (e.g., 'max: 100' for limit). This effectively compensates for the schema's lack of descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Search for academic papers using Semantic Scholar.' It specifies the verb ('search') and resource ('academic papers'), and mentions the data source (Semantic Scholar). However, it doesn't explicitly differentiate this tool from sibling tools like 'search_authors' or 'search_snippets' beyond the resource type.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention sibling tools like 'search_authors' (for author searches) or 'get_paper' (for retrieving specific papers by ID), nor does it provide any context about when this search tool is appropriate versus other search or retrieval methods.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/fegizii/SemanticScholarMCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server