Skip to main content
Glama
fegizii

Semantic Scholar MCP Server

by fegizii

get_paper_batch

Retrieve detailed information for multiple academic papers simultaneously using paper IDs from the Semantic Scholar database.

Instructions

Get information for multiple papers in a single request.

Args:
    paper_ids: Comma-separated list of paper IDs
    fields: Comma-separated list of fields to return

Returns:
    Batch paper information

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
paper_idsYes
fieldsNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes

Implementation Reference

  • The primary handler function for the 'get_paper_batch' tool. It is registered as an MCP tool via the @mcp.tool() decorator. Parses comma-separated paper IDs, calls the Semantic Scholar batch API (/paper/batch), processes results, formats each paper using the format_paper helper, and returns a formatted string output.
    @mcp.tool()
    async def get_paper_batch(paper_ids: str, fields: Optional[str] = None) -> str:
        """
        Get information for multiple papers in a single request.
    
        Args:
            paper_ids: Comma-separated list of paper IDs
            fields: Comma-separated list of fields to return
    
        Returns:
            Batch paper information
        """
        id_list = [id.strip() for id in paper_ids.split(",")]
    
        params: Dict[str, Any] = {"ids": id_list}
    
        if fields:
            params["fields"] = fields
        else:
            params["fields"] = "paperId,title,authors,year,venue,citationCount,abstract"
    
        result = await make_api_request("paper/batch", params, method="POST")
    
        if result is None:
            return "Error: Failed to fetch papers"
    
        if "error" in result:
            return f"Error: {result['error']}"
    
        papers = result if isinstance(result, list) else result.get("data", [])
    
        if not papers:
            return "No papers found for the provided IDs."
    
        formatted_papers = []
        for i, paper in enumerate(papers, 1):
            if paper is None:
                formatted_papers.append(f"{i}. Paper not found")
            elif isinstance(paper, dict):
                formatted_papers.append(f"{i}. {format_paper(paper)}")
            else:
                formatted_papers.append(f"{i}. Invalid paper data")
    
        result_text = f"Retrieved {len(papers)} papers:\n\n"
        result_text += "\n\n".join(formatted_papers)
    
        return result_text
  • Helper function used by get_paper_batch (and other tools) to format individual paper data into a concise, readable multi-line string including title, authors, year, venue, citations, and paper ID.
    def format_paper(paper: Dict[str, Any]) -> str:
        """Format a paper for display."""
        title = paper.get("title", "Unknown Title")
        authors = paper.get("authors", [])
        author_names = [author.get("name", "Unknown") for author in authors[:3]]
        author_str = ", ".join(author_names)
        if len(authors) > 3:
            author_str += f" (and {len(authors) - 3} others)"
    
        year = paper.get("year")
        year_str = f" ({year})" if year else ""
    
        venue = paper.get("venue", "")
        venue_str = f" - {venue}" if venue else ""
    
        citation_count = paper.get("citationCount", 0)
    
        paper_id = paper.get("paperId", "")
    
        return f"Title: {title}\nAuthors: {author_str}{year_str}{venue_str}\nCitations: {citation_count}\nPaper ID: {paper_id}"
  • Core utility function called by get_paper_batch to perform the actual API request to Semantic Scholar's /paper/batch endpoint via POST. Handles authentication, timeouts, errors, and rate limiting gracefully.
    async def make_api_request(
        endpoint: str, params: Optional[Dict[str, Any]] = None, method: str = "GET"
    ) -> Optional[Dict[str, Any]]:
        """Make a request to the Semantic Scholar API."""
        url = f"{BASE_URL}/{endpoint.lstrip('/')}"
    
        headers = {
            "Accept": "application/json",
            "User-Agent": f"semantic-scholar-mcp/{USER_AGENT_VERSION}",
        }
    
        if API_KEY:
            headers["x-api-key"] = API_KEY
    
        try:
            async with httpx.AsyncClient(timeout=API_TIMEOUT) as client:
                if method == "GET":
                    response = await client.get(url, headers=headers, params=params)
                elif method == "POST":
                    response = await client.post(url, headers=headers, json=params)
                else:
                    raise ValueError(f"Unsupported HTTP method: {method}")
    
                response.raise_for_status()
                return response.json()
    
        except httpx.HTTPStatusError as e:
            if e.response.status_code == 403:
                if not API_KEY:
                    return {
                        "error": "Rate limit exceeded. The shared public rate limit (1000 req/sec) may be exceeded. Get a free API key from https://www.semanticscholar.org/product/api for dedicated limits."
                    }
                else:
                    return {
                        "error": f"API key may be invalid or rate limit exceeded: {str(e)}"
                    }
            elif e.response.status_code == 429:
                return {
                    "error": "Rate limit exceeded. Please wait a moment and try again, or get an API key for dedicated higher limits."
                }
            else:
                return {"error": f"HTTP error: {str(e)}"}
        except httpx.HTTPError as e:
            return {"error": f"HTTP error: {str(e)}"}
        except Exception as e:
            return {"error": f"Request failed: {str(e)}"}
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It states the tool retrieves information (implying read-only), but doesn't disclose behavioral traits like rate limits, authentication needs, error handling, or pagination. The description is minimal and lacks context about what 'information' includes or how results are structured, leaving significant gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is highly concise and well-structured: a clear purpose statement followed by brief, bullet-like sections for Args and Returns. Every sentence earns its place with no redundant information, making it easy to scan and understand quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (2 parameters, no annotations, but with an output schema), the description is minimally adequate. The output schema likely covers return values, reducing the need for detailed Returns explanation. However, the description lacks context on error cases, batch size limits, or sibling tool differentiation, leaving room for improvement in completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It lists both parameters ('paper_ids' and 'fields') and explains their formats (comma-separated lists), adding meaning beyond the bare schema. However, it doesn't specify valid ID formats, field options, or default behavior when 'fields' is null, leaving some ambiguity. This partial compensation justifies a baseline score.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Get information for multiple papers in a single request.' This specifies the verb ('get information') and resource ('multiple papers'), distinguishing it from single-paper tools like 'get_paper'. However, it doesn't explicitly differentiate from other batch-capable siblings like 'search_papers', which could also retrieve multiple papers.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention when to prefer this over 'get_paper' (for single papers) or 'search_papers' (for filtered batches), nor does it specify prerequisites or exclusions. The agent must infer usage from the name and parameters alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/fegizii/SemanticScholarMCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server