read_doaj_paper
Extract text content from DOAJ papers by providing the paper identifier. Downloads and processes PDFs to retrieve readable text for research and analysis.
Instructions
Read and extract text content from a DOAJ paper.
Args: paper_id: DOAJ paper identifier. save_path: Directory where the PDF is/will be saved (default: './downloads'). Returns: str: Extracted text content.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| paper_id | Yes | ||
| save_path | No | ./downloads |
Implementation Reference
- paper_search_mcp/server.py:1133-1143 (handler)The tool registration and handler function 'read_doaj_paper' in the MCP server. It delegates to 'doaj_searcher.read_paper'.
@mcp.tool() async def read_doaj_paper(paper_id: str, save_path: str = "./downloads") -> str: """Read and extract text content from a DOAJ paper. Args: paper_id: DOAJ paper identifier. save_path: Directory where the PDF is/will be saved (default: './downloads'). Returns: str: Extracted text content. """ return doaj_searcher.read_paper(paper_id, save_path) - The actual implementation of 'read_paper' within the 'DOAJSearcher' class, which handles the PDF download and text extraction.
def read_paper(self, paper_id: str, save_path: str = "./downloads") -> str: """Read paper text from PDF. Args: paper_id: Paper identifier save_path: Directory where PDF is/will be saved Returns: Extracted text content Raises: NotImplementedError: If PDF cannot be read """ try: # Try to download PDF first pdf_path = self.download_pdf(paper_id, save_path) # Extract text from PDF from PyPDF2 import PdfReader reader = PdfReader(pdf_path) text = "" for page in reader.pages: page_text = page.extract_text() if page_text: text += page_text + "\n" return text.strip() except Exception as e: logger.error(f"Error reading DOAJ paper {paper_id}: {e}") raise NotImplementedError(