download_article
Download scholarly articles as PDF files from arXiv.org using article titles or arXiv IDs for research and analysis.
Instructions
Download the article as a PDF file. Resolve by arXiv ID or title.
Args: title: Article title. arxiv_id: arXiv ID.
Returns: Success message or structured error JSON.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| title | No | ||
| arxiv_id | No |
Implementation Reference
- src/arxiv_server/server.py:175-215 (handler)The @mcp.tool()-decorated handler function that implements the core logic of downloading an arXiv article PDF after resolving the URL via title or ID. Handles retries, saving to disk, and error responses.@mcp.tool() async def download_article( title: Optional[str] = None, arxiv_id: Optional[str] = None, ) -> str: """ Download the article as a PDF file. Resolve by arXiv ID or title. Args: title: Article title. arxiv_id: arXiv ID. Returns: Success message or structured error JSON. """ result = await resolve_article(title=title, arxiv_id=arxiv_id) if isinstance(result, str): return result article_url, resolved_id = result headers = {"User-Agent": USER_AGENT, "Accept": "application/pdf"} file_path = os.path.join(DOWNLOAD_PATH, f"{resolved_id}.pdf") async with httpx.AsyncClient(timeout=DEFAULT_TIMEOUT, limits=HTTP_LIMITS) as client: for attempt in range(RETRY_ATTEMPTS): try: async with client.stream("GET", article_url, headers=headers) as resp: resp.raise_for_status() with open(file_path, "wb") as f: async for chunk in resp.aiter_bytes(): if chunk: f.write(chunk) return json.dumps({ "status": "ok", "message": "Download successful.", "path": file_path, }) except Exception as e: if attempt < RETRY_ATTEMPTS - 1: await _retry_sleep(attempt) continue return _error("DOWNLOAD_FAILED", f"Unable to retrieve or save the article: {e}")
- src/arxiv_server/server.py:123-142 (helper)Key helper function used by download_article (and other tools) to resolve a title or arXiv ID into a direct PDF URL and normalized ID.async def resolve_article(title: Optional[str] = None, arxiv_id: Optional[str] = None) -> Tuple[str, str] | str: """ Resolve to a direct PDF URL and arXiv ID using either a title or an arXiv ID. Preference order: arxiv_id > title. """ if arxiv_id: m = ARXIV_ID_RE.match(arxiv_id.strip()) if not m: return _error("INVALID_ID", f"Not a valid arXiv ID: {arxiv_id}") vid = m.group("id") return (f"https://arxiv.org/pdf/{vid}", vid) if not title: return _error("MISSING_PARAM", "Provide either 'arxiv_id' or 'title'.") info = await fetch_information(title) if isinstance(info, str): return _error("NOT_FOUND", str(info)) resolved_id = info.id.split("/abs/")[-1] direct_pdf_url = f"https://arxiv.org/pdf/{resolved_id}" return (direct_pdf_url, resolved_id)
- src/arxiv_server/server.py:68-82 (helper)Helper for fetching PDF bytes with retries, though download_article implements streaming save directly.async def get_pdf(url: str) -> Optional[bytes]: """Get PDF document as bytes from arXiv.org with retries.""" headers = {"User-Agent": USER_AGENT, "Accept": "application/pdf"} async with httpx.AsyncClient(timeout=DEFAULT_TIMEOUT, limits=HTTP_LIMITS) as client: for attempt in range(RETRY_ATTEMPTS): try: resp = await client.get(url, headers=headers) resp.raise_for_status() return resp.content except Exception: if attempt < RETRY_ATTEMPTS - 1: await _retry_sleep(attempt) continue return None