download_biorxiv
Download PDF files of bioRxiv papers using their DOI identifiers. Specify a paper ID and optional save directory to retrieve research papers for offline access.
Instructions
Download PDF of a bioRxiv paper.
Args: paper_id: bioRxiv DOI. save_path: Directory to save the PDF (default: './downloads'). Returns: Path to the downloaded PDF file.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| paper_id | Yes | ||
| save_path | No | ./downloads |
Implementation Reference
- The implementation of the download_pdf method within the BioRxivSearcher class, which handles the actual downloading of the PDF file from bioRxiv.
def download_pdf(self, paper_id: str, save_path: str) -> str: """ Download a PDF for a given paper ID from bioRxiv. Args: paper_id: The DOI of the paper. save_path: Directory to save the PDF. Returns: Path to the downloaded PDF file. """ if not paper_id: raise ValueError("Invalid paper_id: paper_id is empty") pdf_url = f"https://www.biorxiv.org/content/{paper_id}v1.full.pdf" tries = 0 while tries < self.max_retries: try: # Add User-Agent to avoid potential 403 errors headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36' } response = self.session.get(pdf_url, timeout=self.timeout, headers=headers) response.raise_for_status() os.makedirs(save_path, exist_ok=True) output_file = f"{save_path}/{paper_id.replace('/', '_')}.pdf" with open(output_file, 'wb') as f: f.write(response.content) return output_file except requests.exceptions.RequestException as e: tries += 1 if tries == self.max_retries: raise Exception(f"Failed to download PDF after {self.max_retries} attempts: {e}") print(f"Attempt {tries} failed, retrying...") - paper_search_mcp/server.py:482-491 (handler)The MCP tool handler for 'download_biorxiv' in server.py, which calls the download_pdf method of the BioRxivSearcher class.
async def download_biorxiv(paper_id: str, save_path: str = "./downloads") -> str: """Download PDF of a bioRxiv paper. Args: paper_id: bioRxiv DOI. save_path: Directory to save the PDF (default: './downloads'). Returns: Path to the downloaded PDF file. """ return biorxiv_searcher.download_pdf(paper_id, save_path)