read_biorxiv_paper
Download bioRxiv papers and convert them to Markdown text for analysis, requiring only the paper's DOI identifier.
Instructions
Download and extract full text from bioRxiv paper.
Args:
paper_id: bioRxiv DOI.
save_path: Directory to save PDF.
Returns:
Full paper text in Markdown format.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| paper_id | Yes | ||
| save_path | No |
Implementation Reference
- paper_find_mcp/server.py:351-362 (handler)MCP tool handler function for 'read_biorxiv_paper'. Defines input schema via type hints and docstring, and dispatches to the generic _read helper with 'biorxiv' key.@mcp.tool() async def read_biorxiv_paper(paper_id: str, save_path: Optional[str] = None) -> str: """Download and extract full text from bioRxiv paper. Args: paper_id: bioRxiv DOI. save_path: Directory to save PDF. Returns: Full paper text in Markdown format. """ return await _read('biorxiv', paper_id, save_path)
- paper_find_mcp/server.py:137-157 (helper)Generic helper function _read that retrieves the platform-specific searcher from SEARCHERS dict and calls its read_paper method.async def _read( searcher_name: str, paper_id: str, save_path: Optional[str] = None ) -> str: """通用阅读函数""" if save_path is None: save_path = get_download_path() searcher = SEARCHERS.get(searcher_name) if not searcher: return f"Error: Unknown searcher {searcher_name}" try: return searcher.read_paper(paper_id, save_path) except NotImplementedError as e: return str(e) except Exception as e: logger.error(f"Read failed for {searcher_name}: {e}") return f"Error reading paper: {str(e)}"
- Core implementation in BioRxivSearcher.read_paper: downloads bioRxiv PDF if needed, then extracts full text as Markdown using pymupdf4llm.to_markdown.def read_paper(self, paper_id: str, save_path: str) -> str: """下载并提取论文文本 Args: paper_id: bioRxiv DOI save_path: 保存目录 Returns: 提取的 Markdown 文本 """ pdf_path = os.path.join(save_path, f"{paper_id.replace('/', '_')}.pdf") if not os.path.exists(pdf_path): result = self.download_pdf(paper_id, save_path) if result.startswith("Error"): return result pdf_path = result try: text = pymupdf4llm.to_markdown(pdf_path, show_progress=False) logger.info(f"Extracted {len(text)} characters from {pdf_path}") return text except Exception as e: logger.error(f"Failed to extract text: {e}") return f"Error extracting text: {e}"
- paper_find_mcp/server.py:75-85 (registration)Global SEARCHERS dictionary initialization, instantiating and registering BioRxivSearcher() under 'biorxiv' key for use by _read.SEARCHERS = { 'arxiv': ArxivSearcher(), 'pubmed': PubMedSearcher(), 'biorxiv': BioRxivSearcher(), 'medrxiv': MedRxivSearcher(), 'google_scholar': GoogleScholarSearcher(), 'iacr': IACRSearcher(), 'semantic': SemanticSearcher(), 'crossref': CrossRefSearcher(), 'repec': RePECSearcher(), }