download_biorxiv
Download PDF files from bioRxiv using DOI identifiers. Retrieve open-access preprints and save them to specified directories for academic research.
Instructions
Download PDF from bioRxiv (free and open access).
Args:
paper_id: bioRxiv DOI (e.g., '10.1101/2024.01.01.123456').
save_path: Directory to save PDF.
Returns:
Path to downloaded PDF.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| paper_id | Yes | ||
| save_path | No |
Implementation Reference
- paper_find_mcp/server.py:337-348 (handler)MCP tool handler for download_biorxiv. Decorated with @mcp.tool() for registration and executes by calling the generic _download function with 'biorxiv' searcher.@mcp.tool() async def download_biorxiv(paper_id: str, save_path: Optional[str] = None) -> str: """Download PDF from bioRxiv (free and open access). Args: paper_id: bioRxiv DOI (e.g., '10.1101/2024.01.01.123456'). save_path: Directory to save PDF. Returns: Path to downloaded PDF. """ return await _download('biorxiv', paper_id, save_path)
- Core implementation of PDF download for bioRxiv papers in BioRxivSearcher.download_pdf method. Constructs PDF URL, downloads with requests, saves to file.def download_pdf(self, paper_id: str, save_path: str) -> str: """下载 PDF Args: paper_id: bioRxiv DOI save_path: 保存目录 Returns: 下载的文件路径或错误信息 """ if not paper_id: return "Error: paper_id is empty" pdf_url = f"https://www.biorxiv.org/content/{paper_id}v1.full.pdf" try: response = self.session.get( pdf_url, timeout=self.timeout, headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'} ) response.raise_for_status() os.makedirs(save_path, exist_ok=True) filename = f"{paper_id.replace('/', '_')}.pdf" pdf_path = os.path.join(save_path, filename) with open(pdf_path, 'wb') as f: f.write(response.content) logger.info(f"PDF downloaded: {pdf_path}") return pdf_path except Exception as e: logger.error(f"PDF download failed: {e}") return f"Error downloading PDF: {e}"
- paper_find_mcp/server.py:115-135 (helper)Generic _download helper function used by platform-specific tool handlers. Retrieves searcher instance and calls its download_pdf method.async def _download( searcher_name: str, paper_id: str, save_path: Optional[str] = None ) -> str: """通用下载函数""" if save_path is None: save_path = get_download_path() searcher = SEARCHERS.get(searcher_name) if not searcher: return f"Error: Unknown searcher {searcher_name}" try: return searcher.download_pdf(paper_id, save_path) except NotImplementedError as e: return str(e) except Exception as e: logger.error(f"Download failed for {searcher_name}: {e}") return f"Error downloading: {str(e)}"
- paper_find_mcp/server.py:75-85 (registration)Registration of BioRxivSearcher instance in the global SEARCHERS dictionary, enabling the generic _download function to access it via key 'biorxiv'.SEARCHERS = { 'arxiv': ArxivSearcher(), 'pubmed': PubMedSearcher(), 'biorxiv': BioRxivSearcher(), 'medrxiv': MedRxivSearcher(), 'google_scholar': GoogleScholarSearcher(), 'iacr': IACRSearcher(), 'semantic': SemanticSearcher(), 'crossref': CrossRefSearcher(), 'repec': RePECSearcher(), }