read_biorxiv_paper
Extract text content from bioRxiv paper PDFs using DOI identifiers to access and analyze scientific research documents.
Instructions
Read and extract text content from a bioRxiv paper PDF.
Args: paper_id: bioRxiv DOI. save_path: Directory where the PDF is/will be saved (default: './downloads'). Returns: str: The extracted text content of the paper.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| paper_id | Yes | ||
| save_path | No | ./downloads |
Implementation Reference
- Implementation of read_paper which downloads and extracts text from a bioRxiv PDF.
def read_paper(self, paper_id: str, save_path: str = "./downloads") -> str: """ Read a paper and convert it to text format. Args: paper_id: bioRxiv DOI save_path: Directory where the PDF is/will be saved Returns: str: The extracted text content of the paper """ pdf_path = f"{save_path}/{paper_id.replace('/', '_')}.pdf" if not os.path.exists(pdf_path): pdf_path = self.download_pdf(paper_id, save_path) try: reader = PdfReader(pdf_path) text = "" for page in reader.pages: text += page.extract_text() + "\n" return text.strip() except Exception as e: print(f"Error reading PDF for paper {paper_id}: {e}") return "" - paper_search_mcp/server.py:551-564 (handler)The MCP server wrapper for the read_biorxiv_paper tool that calls the BioRxivSearcher implementation.
async def read_biorxiv_paper(paper_id: str, save_path: str = "./downloads") -> str: """Read and extract text content from a bioRxiv paper PDF. Args: paper_id: bioRxiv DOI. save_path: Directory where the PDF is/will be saved (default: './downloads'). Returns: str: The extracted text content of the paper. """ try: return biorxiv_searcher.read_paper(paper_id, save_path) except Exception as e: print(f"Error reading paper {paper_id}: {e}") return ""