Skip to main content
Glama
h-lu
by h-lu

download_scihub

Download PDFs of academic papers published before 2023 using DOI when papers are behind paywalls and not on arXiv. Provides access to older research literature.

Instructions

Download paper PDF via Sci-Hub using DOI (for older papers only).

USE THIS TOOL WHEN: - You have a DOI and need the full PDF - The paper was published BEFORE 2023 - The paper is behind a paywall and not on arXiv - You first searched CrossRef and got the DOI WORKFLOW: search_crossref(query) -> get DOI -> download_scihub(doi) Args: doi: Paper DOI (e.g., '10.1038/nature12373', '10.1126/science.1234567'). save_path: Directory to save PDF (default: ~/paper_downloads). Returns: Path to downloaded PDF file (e.g., 'downloads/scihub_10.1038_xxx.pdf'), or error message if download fails. Example: download_scihub("10.1038/nature12373") # 2013 Nature paper

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
doiYes
save_pathNo

Implementation Reference

  • MCP tool handler for 'download_scihub'. Decorated with @mcp.tool(), includes type hints, detailed docstring (schema), and delegates execution to SCIHUB.download_pdf.
    @mcp.tool() async def download_scihub(doi: str, save_path: Optional[str] = None) -> str: """Download paper PDF via Sci-Hub using DOI (for older papers only). USE THIS TOOL WHEN: - You have a DOI and need the full PDF - The paper was published BEFORE 2023 - The paper is behind a paywall and not on arXiv - You first searched CrossRef and got the DOI WORKFLOW: search_crossref(query) -> get DOI -> download_scihub(doi) Args: doi: Paper DOI (e.g., '10.1038/nature12373', '10.1126/science.1234567'). save_path: Directory to save PDF (default: ~/paper_downloads). Returns: Path to downloaded PDF file (e.g., 'downloads/scihub_10.1038_xxx.pdf'), or error message if download fails. Example: download_scihub("10.1038/nature12373") # 2013 Nature paper """ if save_path is None: save_path = get_download_path() try: return SCIHUB.download_pdf(doi, save_path) except Exception as e: logger.error(f"Sci-Hub download failed: {e}") return f"Error: {e}"
  • Core helper method implementing the Sci-Hub PDF download logic: gets PDF URL from Sci-Hub page, downloads preferentially with curl (fallback to requests), validates PDF header and size.
    def download_pdf(self, doi: str, save_path: Optional[str] = None) -> str: """通过 DOI 下载论文 PDF 优先使用 curl(更可靠),失败时回退到 requests。 Args: doi: 论文 DOI(如 "10.1038/nature12373") save_path: 保存目录(默认 ~/paper_downloads) Returns: 下载的文件路径或错误信息 """ if not doi or not doi.strip(): return "Error: DOI is empty" doi = doi.strip() # 如果未指定路径,使用用户主目录下的 paper_downloads output_dir = Path(save_path) if save_path else Path.home() / "paper_downloads" output_dir.mkdir(parents=True, exist_ok=True) try: # 获取 PDF URL(必须用 requests 解析 HTML) pdf_url = self._get_pdf_url(doi) if not pdf_url: return f"Error: Could not find PDF for DOI {doi} on Sci-Hub" # 生成文件路径 clean_doi = re.sub(r'[^\w\-_.]', '_', doi) file_path = output_dir / f"scihub_{clean_doi}.pdf" # 方法1: 优先使用 curl(更可靠) if self._download_with_curl(pdf_url, str(file_path)): return str(file_path) logger.info("curl failed, falling back to requests...") # 方法2: 回退到 requests(带重试) max_retries = 3 for attempt in range(max_retries): try: response = self.session.get( pdf_url, verify=False, timeout=(30, 180), # 连接 30s,读取 180s stream=True ) if response.status_code != 200: logger.warning(f"Download failed with status {response.status_code}") continue # 流式写入 with open(file_path, 'wb') as f: for chunk in response.iter_content(chunk_size=8192): if chunk: f.write(chunk) # 验证是 PDF with open(file_path, 'rb') as f: header = f.read(4) if header != b'%PDF': logger.warning("Downloaded file is not a PDF") os.remove(file_path) continue logger.info(f"PDF downloaded with requests: {file_path}") return str(file_path) except requests.exceptions.Timeout: logger.warning(f"Timeout (attempt {attempt + 1}/{max_retries})") except Exception as e: logger.warning(f"Download error (attempt {attempt + 1}/{max_retries}): {e}") return f"Error: Could not download PDF for DOI {doi}" except Exception as e: logger.error(f"Download failed for {doi}: {e}") return f"Error downloading PDF: {e}"
  • Global instance of SciHubFetcher used by the download_scihub handler.
    SCIHUB = SciHubFetcher()
  • Import of SciHubFetcher class used for the tool implementation.
    from .academic_platforms.sci_hub import SciHubFetcher, check_paper_year

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/h-lu/paper-search-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server