read_iacr_paper
Download and extract full text from IACR papers by providing the paper ID, converting PDFs to Markdown format for easy reading and analysis.
Instructions
Download and extract full text from IACR paper.
Args:
paper_id: IACR ID (e.g., '2024/123').
save_path: Directory to save PDF.
Returns:
Full paper text in Markdown format.
Example:
read_iacr_paper("2024/123")
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| paper_id | Yes | ||
| save_path | No |
Implementation Reference
- paper_find_mcp/server.py:527-541 (handler)MCP tool handler and registration for 'read_iacr_paper'. This is the entry point decorated with @mcp.tool(), which delegates to the generic _read helper using the 'iacr' searcher.@mcp.tool() async def read_iacr_paper(paper_id: str, save_path: Optional[str] = None) -> str: """Download and extract full text from IACR paper. Args: paper_id: IACR ID (e.g., '2024/123'). save_path: Directory to save PDF. Returns: Full paper text in Markdown format. Example: read_iacr_paper("2024/123") """ return await _read('iacr', paper_id, save_path)
- paper_find_mcp/server.py:137-156 (helper)Generic helper function _read that retrieves the appropriate searcher (IACRSearcher for 'iacr') and calls its read_paper method, handling errors and save_path.async def _read( searcher_name: str, paper_id: str, save_path: Optional[str] = None ) -> str: """通用阅读函数""" if save_path is None: save_path = get_download_path() searcher = SEARCHERS.get(searcher_name) if not searcher: return f"Error: Unknown searcher {searcher_name}" try: return searcher.read_paper(paper_id, save_path) except NotImplementedError as e: return str(e) except Exception as e: logger.error(f"Read failed for {searcher_name}: {e}") return f"Error reading paper: {str(e)}"
- Core implementation of paper reading in IACRSearcher class: fetches paper details, downloads PDF from eprint.iacr.org/{paper_id}.pdf, extracts Markdown using pymupdf4llm.to_markdown, and prepends metadata.def read_paper(self, paper_id: str, save_path: str) -> str: """下载并提取 IACR 论文文本 使用 PyMuPDF4LLM 提取 Markdown 格式。 Args: paper_id: IACR paper ID (e.g., "2009/101") save_path: 保存目录 Returns: str: 提取的 Markdown 文本或错误信息 """ try: # 获取论文详情 paper = self.get_paper_details(paper_id) if not paper or not paper.pdf_url: return f"Error: Could not find PDF URL for paper {paper_id}" # 下载 PDF pdf_response = requests.get(paper.pdf_url, timeout=30) pdf_response.raise_for_status() # 保存 PDF os.makedirs(save_path, exist_ok=True) filename = f"iacr_{paper_id.replace('/', '_')}.pdf" pdf_path = os.path.join(save_path, filename) with open(pdf_path, "wb") as f: f.write(pdf_response.content) # 使用 PyMuPDF4LLM 提取文本 text = pymupdf4llm.to_markdown(pdf_path, show_progress=False) logger.info(f"Extracted {len(text)} characters from {pdf_path}") if not text.strip(): return f"PDF downloaded to {pdf_path}, but no text could be extracted." # 添加元数据 metadata = f"# {paper.title}\n\n" metadata += f"**Authors**: {', '.join(paper.authors)}\n" metadata += f"**Published**: {paper.published_date}\n" metadata += f"**URL**: {paper.url}\n" metadata += f"**PDF**: {pdf_path}\n\n" metadata += "---\n\n" return metadata + text except requests.RequestException as e: logger.error(f"Error downloading PDF: {e}") return f"Error downloading PDF: {e}" except Exception as e: logger.error(f"Read paper error: {e}") return f"Error reading paper: {e}"