read_hal_paper
Extract text content from HAL research papers by providing a paper identifier and optional save path for PDF storage.
Instructions
Read and extract text content from a HAL paper.
Args: paper_id: HAL paper identifier. save_path: Directory where the PDF is/will be saved (default: './downloads'). Returns: str: Extracted text content.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| paper_id | Yes | ||
| save_path | No | ./downloads |
Implementation Reference
- paper_search_mcp/server.py:1211-1221 (handler)Tool registration for 'read_hal_paper' in server.py.
@mcp.tool() async def read_hal_paper(paper_id: str, save_path: str = "./downloads") -> str: """Read and extract text content from a HAL paper. Args: paper_id: HAL paper identifier. save_path: Directory where the PDF is/will be saved (default: './downloads'). Returns: str: Extracted text content. """ return hal_searcher.read_paper(paper_id, save_path) - Actual implementation logic for reading a HAL paper in HALSearcher class.
def read_paper(self, paper_id: str, save_path: str = "./downloads") -> str: """Download and extract text from a HAL PDF. Args: paper_id: HAL paper ID. save_path: Directory where the PDF is/will be saved. Returns: Extracted text content or error message. """ path = self.download_pdf(paper_id, save_path) if not path.endswith(".pdf"): return path try: try: from PyPDF2 import PdfReader except ImportError: from pypdf import PdfReader reader = PdfReader(path) text_parts = [page.extract_text() for page in reader.pages if page.extract_text()] return "\n\n".join(text_parts) if text_parts else "No extractable text in PDF." except ImportError: return f"PDF downloaded to {path}. Install 'PyPDF2' or 'pypdf' to extract text." except Exception as exc: