read_medrxiv_paper
Extract text content from medRxiv paper PDFs using DOI identifiers for analysis and research purposes.
Instructions
Read and extract text content from a medRxiv paper PDF.
Args: paper_id: medRxiv DOI. save_path: Directory where the PDF is/will be saved (default: './downloads'). Returns: str: The extracted text content of the paper.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| paper_id | Yes | ||
| save_path | No | ./downloads |
Implementation Reference
- paper_search_mcp/server.py:568-581 (handler)The tool handler `read_medrxiv_paper` in `server.py` calls the `read_paper` method of `medrxiv_searcher`.
async def read_medrxiv_paper(paper_id: str, save_path: str = "./downloads") -> str: """Read and extract text content from a medRxiv paper PDF. Args: paper_id: medRxiv DOI. save_path: Directory where the PDF is/will be saved (default: './downloads'). Returns: str: The extracted text content of the paper. """ try: return medrxiv_searcher.read_paper(paper_id, save_path) except Exception as e: print(f"Error reading paper {paper_id}: {e}") return "" - The `MedRxivSearcher.read_paper` implementation handles downloading (if necessary) and text extraction from the PDF.
def read_paper(self, paper_id: str, save_path: str = "./downloads") -> str: """ Read a paper and convert it to text format. Args: paper_id: medRxiv DOI save_path: Directory where the PDF is/will be saved Returns: str: The extracted text content of the paper """ pdf_path = f"{save_path}/{paper_id.replace('/', '_')}.pdf" if not os.path.exists(pdf_path): pdf_path = self.download_pdf(paper_id, save_path) try: reader = PdfReader(pdf_path) text = "" for page in reader.pages: text += page.extract_text() + "\n" return text.strip() except Exception as e: print(f"Error reading PDF for paper {paper_id}: {e}") return "" - paper_search_mcp/server.py:567-567 (registration)The `read_medrxiv_paper` tool is registered using the `@mcp.tool()` decorator.
@mcp.tool()