Skip to main content
Glama
h-lu
by h-lu

download_semantic

Download open-access PDFs from Semantic Scholar when other sources are unavailable. Use this tool as a last resort for academic paper retrieval.

Instructions

Download PDF via Semantic Scholar (open-access only, use as LAST RESORT).

DOWNLOAD PRIORITY (try in order): 1. If arXiv paper -> use download_arxiv(arxiv_id) (always works) 2. If published before 2023 -> use download_scihub(doi) 3. Use this tool as last resort (may not have PDF) Args: paper_id: Semantic Scholar ID, or prefixed: 'DOI:xxx', 'ARXIV:xxx', 'PMID:xxx' save_path: Directory to save PDF. Returns: Path to downloaded PDF, or error if not available.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
paper_idYes
save_pathNo

Implementation Reference

  • MCP tool handler for download_semantic. Registers the tool and delegates execution to SemanticSearcher's download_pdf method via the _download helper.
    @mcp.tool() async def download_semantic(paper_id: str, save_path: Optional[str] = None) -> str: """Download PDF via Semantic Scholar (open-access only, use as LAST RESORT). DOWNLOAD PRIORITY (try in order): 1. If arXiv paper -> use download_arxiv(arxiv_id) (always works) 2. If published before 2023 -> use download_scihub(doi) 3. Use this tool as last resort (may not have PDF) Args: paper_id: Semantic Scholar ID, or prefixed: 'DOI:xxx', 'ARXIV:xxx', 'PMID:xxx' save_path: Directory to save PDF. Returns: Path to downloaded PDF, or error if not available. """ return await _download('semantic', paper_id, save_path)
  • Core implementation of the PDF download logic. Fetches paper details using Semantic Scholar API, extracts open-access PDF URL, downloads and validates the PDF file, handles errors like non-PDF content or timeouts.
    def download_pdf(self, paper_id: str, save_path: str) -> str: """下载论文 PDF Args: paper_id: 论文 ID(支持多种格式) save_path: 保存目录 Returns: 下载的文件路径或错误信息 """ paper = self.get_paper_details(paper_id) if not paper: return f"Error: Could not find paper {paper_id}" if not paper.pdf_url: return f"Error: No PDF URL available for paper {paper_id}" pdf_url = paper.pdf_url logger.info(f"Downloading PDF from: {pdf_url}") try: # 直接使用 requests 下载 pdf_response = requests.get(pdf_url, timeout=60) pdf_response.raise_for_status() # 验证下载的内容是 PDF content_type = pdf_response.headers.get('Content-Type', '') content = pdf_response.content # 检查是否是 PDF(通过内容头部) if not content.startswith(b'%PDF') and 'application/pdf' not in content_type: logger.warning(f"Downloaded content is not a PDF. Content-Type: {content_type}") # 如果是 HTML 页面(如 OSTI),尝试提取真实 PDF 链接 if b'<html' in content[:1000].lower() or b'<!doctype' in content[:1000].lower(): logger.error("Downloaded HTML instead of PDF. The URL may require browser access.") return f"Error: URL {pdf_url} returned HTML, not PDF. This may require direct browser download." # 准备保存路径 os.makedirs(save_path, exist_ok=True) safe_id = paper_id.replace('/', '_').replace(':', '_') filename = f"semantic_{safe_id}.pdf" pdf_path = os.path.join(save_path, filename) with open(pdf_path, "wb") as f: f.write(content) # 最终验证 file_size = os.path.getsize(pdf_path) if file_size < 1000: os.remove(pdf_path) return f"Error: Downloaded file too small ({file_size} bytes)" logger.info(f"PDF downloaded successfully: {pdf_path} ({file_size} bytes)") return pdf_path except requests.exceptions.Timeout: return f"Error: Download timed out for {pdf_url}" except requests.exceptions.RequestException as e: logger.error(f"PDF download error: {e}") return f"Error downloading PDF: {e}"
  • Generic helper function used by download_* tools to invoke the specific searcher's download_pdf method, with error handling and path resolution.
    async def _download( searcher_name: str, paper_id: str, save_path: Optional[str] = None ) -> str: """通用下载函数""" if save_path is None: save_path = get_download_path() searcher = SEARCHERS.get(searcher_name) if not searcher: return f"Error: Unknown searcher {searcher_name}" try: return searcher.download_pdf(paper_id, save_path) except NotImplementedError as e: return str(e) except Exception as e: logger.error(f"Download failed for {searcher_name}: {e}") return f"Error downloading: {str(e)}"
  • Instantiation of SemanticSearcher instance in the global SEARCHERS dictionary, used by _download to access the semantic platform implementation.
    SEARCHERS = { 'arxiv': ArxivSearcher(), 'pubmed': PubMedSearcher(), 'biorxiv': BioRxivSearcher(), 'medrxiv': MedRxivSearcher(), 'google_scholar': GoogleScholarSearcher(), 'iacr': IACRSearcher(), 'semantic': SemanticSearcher(),
  • Type hints defining the input schema (paper_id: str, save_path: Optional[str]) and output (str: path or error). Detailed usage in docstring.
    async def download_semantic(paper_id: str, save_path: Optional[str] = None) -> str: """Download PDF via Semantic Scholar (open-access only, use as LAST RESORT). DOWNLOAD PRIORITY (try in order): 1. If arXiv paper -> use download_arxiv(arxiv_id) (always works) 2. If published before 2023 -> use download_scihub(doi) 3. Use this tool as last resort (may not have PDF) Args: paper_id: Semantic Scholar ID, or prefixed: 'DOI:xxx', 'ARXIV:xxx', 'PMID:xxx' save_path: Directory to save PDF. Returns: Path to downloaded PDF, or error if not available. """ return await _download('semantic', paper_id, save_path)

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/h-lu/paper-search-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server