read_arxiv_paper
Extract text content from arXiv paper PDFs using the paper ID to access and process academic research documents for analysis.
Instructions
Read and extract text content from an arXiv paper PDF.
Args: paper_id: arXiv paper ID (e.g., '2106.12345'). save_path: Directory where the PDF is/will be saved (default: './downloads'). Returns: str: The extracted text content of the paper.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| paper_id | Yes | ||
| save_path | No | ./downloads |
Implementation Reference
- paper_search_mcp/server.py:521-534 (handler)The handler for the read_arxiv_paper tool in the MCP server, which delegates to the ArxivSearcher.
async def read_arxiv_paper(paper_id: str, save_path: str = "./downloads") -> str: """Read and extract text content from an arXiv paper PDF. Args: paper_id: arXiv paper ID (e.g., '2106.12345'). save_path: Directory where the PDF is/will be saved (default: './downloads'). Returns: str: The extracted text content of the paper. """ try: return arxiv_searcher.read_paper(paper_id, save_path) except Exception as e: print(f"Error reading paper {paper_id}: {e}") return "" - The actual implementation logic that reads a PDF file and extracts its text content.
def read_paper(self, paper_id: str, save_path: str = "./downloads") -> str: """Read a paper and convert it to text format. Args: paper_id: arXiv paper ID save_path: Directory where the PDF is/will be saved Returns: str: The extracted text content of the paper """ # First ensure we have the PDF pdf_path = f"{save_path}/{paper_id}.pdf" if not os.path.exists(pdf_path): pdf_path = self.download_pdf(paper_id, save_path) # Read the PDF try: reader = PdfReader(pdf_path) text = "" # Extract text from each page for page in reader.pages: text += page.extract_text() + "\n" return text.strip() except Exception as e: print(f"Error reading PDF for paper {paper_id}: {e}") return "" - paper_search_mcp/server.py:520-521 (registration)The MCP tool registration decorator for the read_arxiv_paper function.
@mcp.tool() async def read_arxiv_paper(paper_id: str, save_path: str = "./downloads") -> str: