semantic_scholar_bulk_papers
Retrieve multiple academic papers in a single request to efficiently access Semantic Scholar's database of 200M+ papers.
Instructions
Retrieve multiple papers in a single request (max 500).
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| params | Yes |
Implementation Reference
- The primary handler function for the 'semantic_scholar_bulk_papers' tool. It performs a batch POST request to the Semantic Scholar API's /paper/batch endpoint to retrieve details for up to 500 papers by their IDs. Handles both JSON and Markdown formatting.@mcp.tool(name="semantic_scholar_bulk_papers") async def get_bulk_papers(params: BulkPaperInput) -> str: """Retrieve multiple papers in a single request (max 500).""" logger.info(f"Bulk retrieval: {len(params.paper_ids)} papers") response = await _make_request("POST", "paper/batch", params={"fields": ",".join(PAPER_FIELDS)}, json_body={"ids": params.paper_ids}) papers = response if isinstance(response, list) else response.get("data", []) if params.response_format == ResponseFormat.JSON: return json.dumps({"requested": len(params.paper_ids), "retrieved": len(papers), "papers": papers}, indent=2) lines = [f"## Bulk Retrieval", f"**Requested:** {len(params.paper_ids)} | **Retrieved:** {len(papers)}", ""] for paper in papers: if paper: lines.append(_format_paper_markdown(paper)) return "\n".join(lines)
- Pydantic input schema validating the tool parameters: a list of paper IDs (1-500 items) and optional response format (default JSON).class BulkPaperInput(BaseModel): model_config = ConfigDict(str_strip_whitespace=True, extra="forbid") paper_ids: List[str] = Field(..., description="List of paper IDs (max 500)", min_length=1, max_length=500) response_format: ResponseFormat = Field(default=ResponseFormat.JSON, description="Output format")
- Constant list of paper fields requested in the API call to ensure comprehensive metadata retrieval.PAPER_FIELDS: List[str] = [ "paperId", "corpusId", "url", "title", "abstract", "venue", "publicationVenue", "year", "referenceCount", "citationCount", "influentialCitationCount", "isOpenAccess", "openAccessPdf", "fieldsOfStudy", "s2FieldsOfStudy", "publicationTypes", "publicationDate", "journal", "citationStyles", "authors", "externalIds", "tldr" ]
- Shared HTTP request utility function used by the bulk handler to make the POST request to the Semantic Scholar API, handling errors and authentication.async def _make_request( method: str, endpoint: str, params: Optional[Dict] = None, json_body: Optional[Dict] = None ) -> Dict[str, Any]: url = f"{SEMANTIC_SCHOLAR_API_BASE}/{endpoint}" async with httpx.AsyncClient(timeout=DEFAULT_TIMEOUT) as client: try: if method == "GET": resp = await client.get(url, params=params, headers=_get_headers()) else: resp = await client.post(url, params=params, json=json_body, headers=_get_headers()) resp.raise_for_status() return resp.json() except httpx.HTTPStatusError as e: _handle_error(e.response.status_code) except httpx.TimeoutException: raise Exception("Request timed out") return {}
- Utility function to format individual paper data into Markdown for the tool's Markdown response mode.def _format_paper_markdown(paper: Dict[str, Any]) -> str: lines = [] title = paper.get("title", "Unknown Title") year = paper.get("year", "N/A") lines.append(f"### {title} ({year})") authors = paper.get("authors", []) if authors: names = [a.get("name", "?") for a in authors[:5]] if len(authors) > 5: names.append(f"... +{len(authors)-5} more") lines.append(f"**Authors:** {', '.join(names)}") venue = paper.get("venue") or (paper.get("publicationVenue") or {}).get("name") if venue: lines.append(f"**Venue:** {venue}") citations = paper.get("citationCount", 0) influential = paper.get("influentialCitationCount", 0) lines.append(f"**Citations:** {citations} ({influential} influential)") pdf_info = paper.get("openAccessPdf") or {} if pdf_info.get("url"): lines.append(f"**Open Access:** [PDF]({pdf_info['url']})") fields = paper.get("fieldsOfStudy") or [] if fields: lines.append(f"**Fields:** {', '.join(fields[:5])}") tldr = paper.get("tldr") or {} if tldr.get("text"): lines.append(f"**TL;DR:** {tldr['text']}") abstract = paper.get("abstract") if abstract: lines.append(f"**Abstract:** {abstract[:500]}..." if len(abstract) > 500 else f"**Abstract:** {abstract}") ext_ids = paper.get("externalIds") or {} ids = [] if ext_ids.get("DOI"): ids.append(f"DOI: {ext_ids['DOI']}") if ext_ids.get("ArXiv"): ids.append(f"ArXiv: {ext_ids['ArXiv']}") if ext_ids.get("PubMed"): ids.append(f"PMID: {ext_ids['PubMed']}") if ids: lines.append(f"**IDs:** {', '.join(ids)}") if paper.get("url"): lines.append(f"**Link:** [{paper.get('paperId')}]({paper['url']})") lines.append("") return "\n".join(lines)