download_doaj
Download PDF files from DOAJ using paper identifiers. Specify a paper ID to retrieve and save academic papers to a designated directory for research access.
Instructions
Download PDF for a paper from DOAJ.
Args: paper_id: DOAJ paper identifier. save_path: Directory to save the PDF (default: './downloads'). Returns: str: Path to downloaded PDF.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| paper_id | Yes | ||
| save_path | No | ./downloads |
Implementation Reference
- paper_search_mcp/server.py:1146-1156 (handler)The MCP tool registration and entry point for 'download_doaj'.
@mcp.tool() async def download_doaj(paper_id: str, save_path: str = "./downloads") -> str: """Download PDF for a paper from DOAJ. Args: paper_id: DOAJ paper identifier. save_path: Directory to save the PDF (default: './downloads'). Returns: str: Path to downloaded PDF. """ return doaj_searcher.download_pdf(paper_id, save_path) - The actual logic implementation for downloading a DOAJ PDF within the DOAJSearcher class. (Snippet simplified)
def download_pdf(self, paper_id: str, save_path: str) -> str: """Download PDF for a DOAJ article. DOAJ provides direct PDF links for open access articles. Args: paper_id: DOAJ article ID or DOI save_path: Directory to save PDF Returns: Path to saved PDF file Raises: ValueError: If paper not found or no PDF available IOError: If download fails """ # Try to get paper info first papers = self.search(paper_id, max_results=1) if not papers: raise ValueError(f"DOAJ article not found: {paper_id}") paper = papers[0] if not paper.pdf_url: # Try to construct PDF URL from DOI if paper.doi: # Some publishers provide direct PDF links via DOI pdf_url = f"https://doi.org/{paper.doi}" # But we need to check if it's actually a PDF # For now, try the URL paper.pdf_url = pdf_url else: raise ValueError(f"No PDF available for DOAJ article: {paper_id}") # Download PDF import os response = self.session.get(paper.pdf_url, timeout=30) response.raise_for_status() # Check if response is actually PDF content_type = response.headers.get('content-type', '') if 'pdf' not in content_type.lower() and not paper.pdf_url.lower().endswith('.pdf'): logger.warning(f"Response may not be PDF: {content_type}") os.makedirs(save_path, exist_ok=True) # Create safe filename