download_arxiv
Download PDFs from arXiv using paper IDs to access academic papers for research and study purposes.
Instructions
Download PDF from arXiv (always free and available).
Args:
paper_id: arXiv ID (e.g., '2106.12345', '2312.00001v2').
save_path: Directory to save PDF (default: ~/paper_downloads).
Returns:
Path to downloaded PDF file.
Example:
download_arxiv("2106.12345")
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| paper_id | Yes | ||
| save_path | No |
Implementation Reference
- paper_find_mcp/server.py:194-208 (handler)MCP tool handler and registration for 'download_arxiv'. This is the entry point decorated with @mcp.tool(), defining input schema via args/docstring and delegating to the generic _download helper using the 'arxiv' searcher.@mcp.tool() async def download_arxiv(paper_id: str, save_path: Optional[str] = None) -> str: """Download PDF from arXiv (always free and available). Args: paper_id: arXiv ID (e.g., '2106.12345', '2312.00001v2'). save_path: Directory to save PDF (default: ~/paper_downloads). Returns: Path to downloaded PDF file. Example: download_arxiv("2106.12345") """ return await _download('arxiv', paper_id, save_path)
- paper_find_mcp/server.py:115-135 (helper)Generic download helper function used by all platform-specific download tools, including download_arxiv. Retrieves the searcher instance and calls its download_pdf method.async def _download( searcher_name: str, paper_id: str, save_path: Optional[str] = None ) -> str: """通用下载函数""" if save_path is None: save_path = get_download_path() searcher = SEARCHERS.get(searcher_name) if not searcher: return f"Error: Unknown searcher {searcher_name}" try: return searcher.download_pdf(paper_id, save_path) except NotImplementedError as e: return str(e) except Exception as e: logger.error(f"Download failed for {searcher_name}: {e}") return f"Error downloading: {str(e)}"
- Core implementation of PDF download in ArxivSearcher.download_pdf method. Downloads from https://arxiv.org/pdf/{paper_id}.pdf, handles caching, sanitizes filename, and saves to specified directory.def download_pdf(self, paper_id: str, save_path: str) -> str: """下载 arXiv 论文 PDF Args: paper_id: arXiv 论文 ID (例如 '2106.12345') save_path: 保存目录 Returns: str: PDF 文件路径 Raises: RuntimeError: 下载失败时抛出 """ # 确保目录存在 os.makedirs(save_path, exist_ok=True) # 构建文件路径 # 处理带版本号的 ID (例如 2106.12345v2) safe_id = paper_id.replace('/', '_').replace(':', '_') output_file = os.path.join(save_path, f"{safe_id}.pdf") # 检查文件是否已存在 if os.path.exists(output_file): logger.info(f"PDF already exists: {output_file}") return output_file # 下载 PDF pdf_url = f"https://arxiv.org/pdf/{paper_id}.pdf" try: response = requests.get(pdf_url, timeout=60) response.raise_for_status() with open(output_file, 'wb') as f: f.write(response.content) logger.info(f"PDF downloaded: {output_file}") return output_file except requests.RequestException as e: raise RuntimeError(f"Failed to download PDF: {e}")
- paper_find_mcp/server.py:75-85 (registration)Registration of searcher instances, including 'arxiv': ArxivSearcher(), which provides the download_pdf implementation used by download_arxiv.SEARCHERS = { 'arxiv': ArxivSearcher(), 'pubmed': PubMedSearcher(), 'biorxiv': BioRxivSearcher(), 'medrxiv': MedRxivSearcher(), 'google_scholar': GoogleScholarSearcher(), 'iacr': IACRSearcher(), 'semantic': SemanticSearcher(), 'crossref': CrossRefSearcher(), 'repec': RePECSearcher(), }