read_zenodo_paper
Extract text content from Zenodo research papers by providing the paper identifier to access and process academic documents.
Instructions
Read and extract text content from a Zenodo paper.
Args: paper_id: Zenodo paper identifier. save_path: Directory where the PDF is/will be saved (default: './downloads'). Returns: str: Extracted text content.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| paper_id | Yes | ||
| save_path | No | ./downloads |
Implementation Reference
- The implementation of `read_paper` which downloads a PDF from Zenodo and extracts its text.
def read_paper(self, paper_id: str, save_path: str = "./downloads") -> str: """Download and extract text from a Zenodo PDF. Args: paper_id: Zenodo record ID or DOI. save_path: Directory where the PDF is/will be saved. Returns: Extracted text content or error message. """ path = self.download_pdf(paper_id, save_path) if not path.endswith(".pdf"): return path # error message try: try: from PyPDF2 import PdfReader except ImportError: from pypdf import PdfReader reader = PdfReader(path) text_parts = [page.extract_text() for page in reader.pages if page.extract_text()] return "\n\n".join(text_parts) if text_parts else "No extractable text in PDF." except ImportError: return f"PDF downloaded to {path}. Install 'PyPDF2' or 'pypdf' to extract text." except Exception as exc: return f"PDF downloaded to {path} but text extraction failed: {exc}" - paper_search_mcp/server.py:1186-1195 (handler)MCP tool handler registration for `read_zenodo_paper` in `server.py`.
async def read_zenodo_paper(paper_id: str, save_path: str = "./downloads") -> str: """Read and extract text content from a Zenodo paper. Args: paper_id: Zenodo paper identifier. save_path: Directory where the PDF is/will be saved (default: './downloads'). Returns: str: Extracted text content. """ return zenodo_searcher.read_paper(paper_id, save_path)