Skip to main content
Glama
fegizii

Semantic Scholar MCP Server

by fegizii

get_paper_pdf_info

Check PDF availability for academic papers using Semantic Scholar IDs, DOIs, or ArXiv IDs to determine if full-text documents are accessible.

Instructions

Get PDF availability information for a paper.

Args:
    paper_id: Paper ID (Semantic Scholar ID, DOI, ArXiv ID, etc.)

Returns:
    PDF availability information

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
paper_idYes

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes

Implementation Reference

  • Implementation of the get_paper_pdf_info tool handler. Fetches paper details from Semantic Scholar API focusing on openAccessPdf field, formats availability info including direct URL if available, and lists alternative sources like ArXiv, DOI, PubMed.
    @mcp.tool()
    async def get_paper_pdf_info(paper_id: str) -> str:
        """
        Get PDF availability information for a paper.
    
        Args:
            paper_id: Paper ID (Semantic Scholar ID, DOI, ArXiv ID, etc.)
    
        Returns:
            PDF availability information
        """
        encoded_id = quote(paper_id, safe="")
        result = await make_api_request(
            f"paper/{encoded_id}", {"fields": "paperId,title,openAccessPdf,externalIds"}
        )
    
        if result is None:
            return "Error: Failed to fetch paper information"
    
        if "error" in result:
            return f"Error: {result['error']}"
    
        title = result.get("title", "Unknown Title")
        open_access = result.get("openAccessPdf")
        external_ids = result.get("externalIds", {})
    
        result_text = f"PDF Information for: {title}\n\n"
    
        if open_access and open_access.get("url"):
            pdf_url = open_access["url"]
            result_text += "✅ Open Access PDF Available\n"
            result_text += f"URL: {pdf_url}\n"
            result_text += "Status: Ready for download\n\n"
        else:
            result_text += "❌ No Open Access PDF Available\n\n"
    
        # Check for potential alternative sources
        result_text += "Alternative sources to check:\n"
        if external_ids.get("ArXiv"):
            result_text += f"- ArXiv: https://arxiv.org/abs/{external_ids['ArXiv']}\n"
        if external_ids.get("DOI"):
            result_text += f"- Publisher (DOI): https://doi.org/{external_ids['DOI']}\n"
        if external_ids.get("PubMed"):
            result_text += (
                f"- PubMed: https://pubmed.ncbi.nlm.nih.gov/{external_ids['PubMed']}/\n"
            )
    
        return result_text
  • The @mcp.tool() decorator registers the get_paper_pdf_info function as an MCP tool.
    @mcp.tool()
  • Helper function used by get_paper_pdf_info to make API requests to Semantic Scholar, handling errors and API keys.
    async def make_api_request(
        endpoint: str, params: Optional[Dict[str, Any]] = None, method: str = "GET"
    ) -> Optional[Dict[str, Any]]:
        """Make a request to the Semantic Scholar API."""
        url = f"{BASE_URL}/{endpoint.lstrip('/')}"
    
        headers = {
            "Accept": "application/json",
            "User-Agent": f"semantic-scholar-mcp/{USER_AGENT_VERSION}",
        }
    
        if API_KEY:
            headers["x-api-key"] = API_KEY
    
        try:
            async with httpx.AsyncClient(timeout=API_TIMEOUT) as client:
                if method == "GET":
                    response = await client.get(url, headers=headers, params=params)
                elif method == "POST":
                    response = await client.post(url, headers=headers, json=params)
                else:
                    raise ValueError(f"Unsupported HTTP method: {method}")
    
                response.raise_for_status()
                return response.json()
    
        except httpx.HTTPStatusError as e:
            if e.response.status_code == 403:
                if not API_KEY:
                    return {
                        "error": "Rate limit exceeded. The shared public rate limit (1000 req/sec) may be exceeded. Get a free API key from https://www.semanticscholar.org/product/api for dedicated limits."
                    }
                else:
                    return {
                        "error": f"API key may be invalid or rate limit exceeded: {str(e)}"
                    }
            elif e.response.status_code == 429:
                return {
                    "error": "Rate limit exceeded. Please wait a moment and try again, or get an API key for dedicated higher limits."
                }
            else:
                return {"error": f"HTTP error: {str(e)}"}
        except httpx.HTTPError as e:
            return {"error": f"HTTP error: {str(e)}"}
        except Exception as e:
            return {"error": f"Request failed: {str(e)}"}
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It states the tool gets 'PDF availability information,' which implies a read-only operation, but doesn't clarify aspects like rate limits, authentication needs, error handling, or what specific information is returned (e.g., URLs, access status). This leaves significant gaps for a tool with no annotation coverage.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is highly concise and well-structured: a clear purpose statement followed by brief sections for 'Args' and 'Returns.' Each sentence earns its place by providing essential information without redundancy, making it easy to scan and understand quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's low complexity (1 parameter, no nested objects) and the presence of an output schema (which handles return value documentation), the description is adequate but has gaps. It covers the basic purpose and parameter semantics but lacks usage guidelines and behavioral details, making it minimally viable but not fully comprehensive for an agent's needs.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds meaningful context for the single parameter 'paper_id' by specifying it as 'Paper ID (Semantic Scholar ID, DOI, ArXiv ID, etc.),' which clarifies acceptable formats beyond what the schema provides (schema description coverage is 0%). This compensates well for the low schema coverage, though it doesn't detail constraints like length or validation rules.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Get PDF availability information for a paper.' It specifies the verb ('Get') and resource ('PDF availability information for a paper'), making it easy to understand what the tool does. However, it doesn't explicitly differentiate from sibling tools like 'download_paper_pdf' or 'get_paper', which would require a 5.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention sibling tools like 'download_paper_pdf' (which might download the PDF) or 'get_paper' (which might retrieve general paper info), leaving the agent to infer usage context without explicit direction.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/fegizii/SemanticScholarMCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server