read_medrxiv_paper
Extract full text from medRxiv papers by providing a DOI, converting PDFs to Markdown format for analysis.
Instructions
Download and extract full text from medRxiv paper.
Args:
paper_id: medRxiv DOI.
save_path: Directory to save PDF.
Returns:
Full paper text in Markdown format.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| paper_id | Yes | ||
| save_path | No |
Implementation Reference
- paper_find_mcp/server.py:415-426 (handler)MCP tool handler for 'read_medrxiv_paper'. Decorated with @mcp.tool() for registration and execution. Delegates to the generic _read function using the 'medrxiv' searcher instance.@mcp.tool() async def read_medrxiv_paper(paper_id: str, save_path: Optional[str] = None) -> str: """Download and extract full text from medRxiv paper. Args: paper_id: medRxiv DOI. save_path: Directory to save PDF. Returns: Full paper text in Markdown format. """ return await _read('medrxiv', paper_id, save_path)
- Core implementation of paper reading logic in MedRxivSearcher.read_paper(). Downloads the PDF from medRxiv.org if not present, then extracts full text as Markdown using pymupdf4llm.to_markdown()def read_paper(self, paper_id: str, save_path: str) -> str: """下载并提取论文文本 Args: paper_id: medRxiv DOI save_path: 保存目录 Returns: 提取的 Markdown 文本 """ pdf_path = os.path.join(save_path, f"{paper_id.replace('/', '_')}.pdf") if not os.path.exists(pdf_path): result = self.download_pdf(paper_id, save_path) if result.startswith("Error"): return result pdf_path = result try: text = pymupdf4llm.to_markdown(pdf_path, show_progress=False) logger.info(f"Extracted {len(text)} characters from {pdf_path}") return text except Exception as e: logger.error(f"Failed to extract text: {e}") return f"Error extracting text: {e}"
- paper_find_mcp/server.py:137-157 (helper)Generic _read helper function called by all read_*_paper tools. Retrieves the platform-specific searcher from SEARCHERS dict and invokes its read_paper method.async def _read( searcher_name: str, paper_id: str, save_path: Optional[str] = None ) -> str: """通用阅读函数""" if save_path is None: save_path = get_download_path() searcher = SEARCHERS.get(searcher_name) if not searcher: return f"Error: Unknown searcher {searcher_name}" try: return searcher.read_paper(paper_id, save_path) except NotImplementedError as e: return str(e) except Exception as e: logger.error(f"Read failed for {searcher_name}: {e}") return f"Error reading paper: {str(e)}"
- paper_find_mcp/server.py:75-85 (registration)Global SEARCHERS dictionary where MedRxivSearcher instance is registered under 'medrxiv' key, enabling the generic _read function to dispatch to the correct implementation.SEARCHERS = { 'arxiv': ArxivSearcher(), 'pubmed': PubMedSearcher(), 'biorxiv': BioRxivSearcher(), 'medrxiv': MedRxivSearcher(), 'google_scholar': GoogleScholarSearcher(), 'iacr': IACRSearcher(), 'semantic': SemanticSearcher(), 'crossref': CrossRefSearcher(), 'repec': RePECSearcher(), }