Skip to main content
Glama
h-lu

Paper Search MCP Server

by h-lu

read_biorxiv_paper

Download bioRxiv papers and convert them to Markdown text for analysis, requiring only the paper's DOI identifier.

Instructions

Download and extract full text from bioRxiv paper.

Args:
    paper_id: bioRxiv DOI.
    save_path: Directory to save PDF.

Returns:
    Full paper text in Markdown format.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
paper_idYes
save_pathNo

Implementation Reference

  • MCP tool handler function for 'read_biorxiv_paper'. Defines input schema via type hints and docstring, and dispatches to the generic _read helper with 'biorxiv' key.
    @mcp.tool()
    async def read_biorxiv_paper(paper_id: str, save_path: Optional[str] = None) -> str:
        """Download and extract full text from bioRxiv paper.
        
        Args:
            paper_id: bioRxiv DOI.
            save_path: Directory to save PDF.
        
        Returns:
            Full paper text in Markdown format.
        """
        return await _read('biorxiv', paper_id, save_path)
  • Generic helper function _read that retrieves the platform-specific searcher from SEARCHERS dict and calls its read_paper method.
    async def _read(
        searcher_name: str, 
        paper_id: str, 
        save_path: Optional[str] = None
    ) -> str:
        """通用阅读函数"""
        if save_path is None:
            save_path = get_download_path()
        
        searcher = SEARCHERS.get(searcher_name)
        if not searcher:
            return f"Error: Unknown searcher {searcher_name}"
        
        try:
            return searcher.read_paper(paper_id, save_path)
        except NotImplementedError as e:
            return str(e)
        except Exception as e:
            logger.error(f"Read failed for {searcher_name}: {e}")
            return f"Error reading paper: {str(e)}"
  • Core implementation in BioRxivSearcher.read_paper: downloads bioRxiv PDF if needed, then extracts full text as Markdown using pymupdf4llm.to_markdown.
    def read_paper(self, paper_id: str, save_path: str) -> str:
        """下载并提取论文文本
        
        Args:
            paper_id: bioRxiv DOI
            save_path: 保存目录
            
        Returns:
            提取的 Markdown 文本
        """
        pdf_path = os.path.join(save_path, f"{paper_id.replace('/', '_')}.pdf")
        
        if not os.path.exists(pdf_path):
            result = self.download_pdf(paper_id, save_path)
            if result.startswith("Error"):
                return result
            pdf_path = result
        
        try:
            text = pymupdf4llm.to_markdown(pdf_path, show_progress=False)
            logger.info(f"Extracted {len(text)} characters from {pdf_path}")
            return text
        except Exception as e:
            logger.error(f"Failed to extract text: {e}")
            return f"Error extracting text: {e}"
  • Global SEARCHERS dictionary initialization, instantiating and registering BioRxivSearcher() under 'biorxiv' key for use by _read.
    SEARCHERS = {
        'arxiv': ArxivSearcher(),
        'pubmed': PubMedSearcher(),
        'biorxiv': BioRxivSearcher(),
        'medrxiv': MedRxivSearcher(),
        'google_scholar': GoogleScholarSearcher(),
        'iacr': IACRSearcher(),
        'semantic': SemanticSearcher(),
        'crossref': CrossRefSearcher(),
        'repec': RePECSearcher(),
    }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/h-lu/paper-search-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server