Semantic Scholar MCP Server

get_paper_batch

Retrieve detailed information for multiple academic papers simultaneously using paper IDs from the Semantic Scholar database.

Instructions

Get information for multiple papers in a single request.

Args:
    paper_ids: Comma-separated list of paper IDs
    fields: Comma-separated list of fields to return

Returns:
    Batch paper information

Input Schema

TableJSON Schema

Name	Required	Description	Default
`paper_ids`	Yes
`fields`	No

Output Schema

TableJSON Schema

Name	Required	Description	Default
`result`	Yes

Implementation Reference

src/semantic_scholar_mcp/server.py:240-286 (handler)

The primary handler function for the 'get_paper_batch' tool. It is registered as an MCP tool via the @mcp.tool() decorator. Parses comma-separated paper IDs, calls the Semantic Scholar batch API (/paper/batch), processes results, formats each paper using the format_paper helper, and returns a formatted string output.

@mcp.tool()
async def get_paper_batch(paper_ids: str, fields: Optional[str] = None) -> str:
    """
    Get information for multiple papers in a single request.

    Args:
        paper_ids: Comma-separated list of paper IDs
        fields: Comma-separated list of fields to return

    Returns:
        Batch paper information
    """
    id_list = [id.strip() for id in paper_ids.split(",")]

    params: Dict[str, Any] = {"ids": id_list}

    if fields:
        params["fields"] = fields
    else:
        params["fields"] = "paperId,title,authors,year,venue,citationCount,abstract"

    result = await make_api_request("paper/batch", params, method="POST")

    if result is None:
        return "Error: Failed to fetch papers"

    if "error" in result:
        return f"Error: {result['error']}"

    papers = result if isinstance(result, list) else result.get("data", [])

    if not papers:
        return "No papers found for the provided IDs."

    formatted_papers = []
    for i, paper in enumerate(papers, 1):
        if paper is None:
            formatted_papers.append(f"{i}. Paper not found")
        elif isinstance(paper, dict):
            formatted_papers.append(f"{i}. {format_paper(paper)}")
        else:
            formatted_papers.append(f"{i}. Invalid paper data")

    result_text = f"Retrieved {len(papers)} papers:\n\n"
    result_text += "\n\n".join(formatted_papers)

    return result_text

src/semantic_scholar_mcp/server.py:72-91 (helper)

Helper function used by get_paper_batch (and other tools) to format individual paper data into a concise, readable multi-line string including title, authors, year, venue, citations, and paper ID.

def format_paper(paper: Dict[str, Any]) -> str:
    """Format a paper for display."""
    title = paper.get("title", "Unknown Title")
    authors = paper.get("authors", [])
    author_names = [author.get("name", "Unknown") for author in authors[:3]]
    author_str = ", ".join(author_names)
    if len(authors) > 3:
        author_str += f" (and {len(authors) - 3} others)"

    year = paper.get("year")
    year_str = f" ({year})" if year else ""

    venue = paper.get("venue", "")
    venue_str = f" - {venue}" if venue else ""

    citation_count = paper.get("citationCount", 0)

    paper_id = paper.get("paperId", "")

    return f"Title: {title}\nAuthors: {author_str}{year_str}{venue_str}\nCitations: {citation_count}\nPaper ID: {paper_id}"

src/semantic_scholar_mcp/server.py:24-69 (helper)

Core utility function called by get_paper_batch to perform the actual API request to Semantic Scholar's /paper/batch endpoint via POST. Handles authentication, timeouts, errors, and rate limiting gracefully.

async def make_api_request(
    endpoint: str, params: Optional[Dict[str, Any]] = None, method: str = "GET"
) -> Optional[Dict[str, Any]]:
    """Make a request to the Semantic Scholar API."""
    url = f"{BASE_URL}/{endpoint.lstrip('/')}"

    headers = {
        "Accept": "application/json",
        "User-Agent": f"semantic-scholar-mcp/{USER_AGENT_VERSION}",
    }

    if API_KEY:
        headers["x-api-key"] = API_KEY

    try:
        async with httpx.AsyncClient(timeout=API_TIMEOUT) as client:
            if method == "GET":
                response = await client.get(url, headers=headers, params=params)
            elif method == "POST":
                response = await client.post(url, headers=headers, json=params)
            else:
                raise ValueError(f"Unsupported HTTP method: {method}")

            response.raise_for_status()
            return response.json()

    except httpx.HTTPStatusError as e:
        if e.response.status_code == 403:
            if not API_KEY:
                return {
                    "error": "Rate limit exceeded. The shared public rate limit (1000 req/sec) may be exceeded. Get a free API key from https://www.semanticscholar.org/product/api for dedicated limits."
                }
            else:
                return {
                    "error": f"API key may be invalid or rate limit exceeded: {str(e)}"
                }
        elif e.response.status_code == 429:
            return {
                "error": "Rate limit exceeded. Please wait a moment and try again, or get an API key for dedicated higher limits."
            }
        else:
            return {"error": f"HTTP error: {str(e)}"}
    except httpx.HTTPError as e:
        return {"error": f"HTTP error: {str(e)}"}
    except Exception as e:
        return {"error": f"Request failed: {str(e)}"}

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It states the tool retrieves information (implying read-only), but doesn't disclose behavioral traits like rate limits, authentication needs, error handling, or pagination. The description is minimal and lacks context about what 'information' includes or how results are structured, leaving significant gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is highly concise and well-structured: a clear purpose statement followed by brief, bullet-like sections for Args and Returns. Every sentence earns its place with no redundant information, making it easy to scan and understand quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (2 parameters, no annotations, but with an output schema), the description is minimally adequate. The output schema likely covers return values, reducing the need for detailed Returns explanation. However, the description lacks context on error cases, batch size limits, or sibling tool differentiation, leaving room for improvement in completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It lists both parameters ('paper_ids' and 'fields') and explains their formats (comma-separated lists), adding meaning beyond the bare schema. However, it doesn't specify valid ID formats, field options, or default behavior when 'fields' is null, leaving some ambiguity. This partial compensation justifies a baseline score.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Get information for multiple papers in a single request.' This specifies the verb ('get information') and resource ('multiple papers'), distinguishing it from single-paper tools like 'get_paper'. However, it doesn't explicitly differentiate from other batch-capable siblings like 'search_papers', which could also retrieve multiple papers.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention when to prefer this over 'get_paper' (for single papers) or 'search_papers' (for filtered batches), nor does it specify prerequisites or exclusions. The agent must infer usage from the name and parameters alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/fegizii/SemanticScholarMCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server