read_iacr_paper
Extract text content from IACR ePrint papers by providing the paper ID, enabling analysis of cryptographic research documents.
Instructions
Read and extract text content from an IACR ePrint paper PDF.
Args: paper_id: IACR paper ID (e.g., '2009/101'). save_path: Directory where the PDF is/will be saved (default: './downloads'). Returns: str: The extracted text content of the paper.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| paper_id | Yes | ||
| save_path | No | ./downloads |
Implementation Reference
- paper_search_mcp/server.py:585-598 (handler)The MCP tool handler for 'read_iacr_paper' in server.py, which calls the underlying iacr_searcher instance.
async def read_iacr_paper(paper_id: str, save_path: str = "./downloads") -> str: """Read and extract text content from an IACR ePrint paper PDF. Args: paper_id: IACR paper ID (e.g., '2009/101'). save_path: Directory where the PDF is/will be saved (default: './downloads'). Returns: str: The extracted text content of the paper. """ try: return iacr_searcher.read_paper(paper_id, save_path) except Exception as e: print(f"Error reading paper {paper_id}: {e}") return "" - The core implementation of the read_paper logic within the IACRSearcher class.
def read_paper(self, paper_id: str, save_path: str = "./downloads") -> str: """ Download and extract text from IACR paper PDF Args: paper_id: IACR paper ID save_path: Directory to save downloaded PDF Returns: str: Extracted text from the PDF or error message """ try: # First get paper details to get the PDF URL paper = self.get_paper_details(paper_id) if not paper or not paper.pdf_url: return f"Error: Could not find PDF URL for paper {paper_id}" # Download the PDF pdf_response = requests.get(paper.pdf_url, timeout=30) pdf_response.raise_for_status() # Create download directory if it doesn't exist os.makedirs(save_path, exist_ok=True) # Save the PDF filename = f"iacr_{paper_id.replace('/', '_')}.pdf" pdf_path = os.path.join(save_path, filename) with open(pdf_path, "wb") as f: f.write(pdf_response.content) # Extract text using PyPDF2 reader = PdfReader(pdf_path) text = "" for page_num, page in enumerate(reader.pages): try: page_text = page.extract_text() if page_text: text += f"\n--- Page {page_num + 1} ---\n" text += page_text + "\n" except Exception as e: logger.warning( f"Failed to extract text from page {page_num + 1}: {e}" ) continue if not text.strip(): return ( f"PDF downloaded to {pdf_path}, but unable to extract readable text" ) # Add paper metadata at the beginning metadata = f"Title: {paper.title}\n" metadata += f"Authors: {', '.join(paper.authors)}\n" metadata += f"Published Date: {paper.published_date}\n" metadata += f"URL: {paper.url}\n" metadata += f"PDF downloaded to: {pdf_path}\n" metadata += "=" * 80 + "\n\n" return metadata + text.strip() except requests.RequestException as e: logger.error(f"Error downloading PDF: {e}") return f"Error downloading PDF: {e}" except Exception as e: logger.error(f"Read paper error: {e}") return f"Error reading paper: {e}" - paper_search_mcp/server.py:584-585 (registration)Tool registration decorator for read_iacr_paper.
@mcp.tool() async def read_iacr_paper(paper_id: str, save_path: str = "./downloads") -> str: