Skip to main content
Glama
h-lu

Paper Search MCP Server

by h-lu

read_semantic_paper

Extract full text from open-access academic papers using Semantic Scholar IDs, converting PDFs to Markdown format for analysis and reference.

Instructions

Read paper via Semantic Scholar (open-access only, use as LAST RESORT).

DOWNLOAD PRIORITY (try in order):
1. If arXiv paper -> use read_arxiv_paper(arxiv_id)
2. If published before 2023 -> use read_scihub_paper(doi)
3. Use this tool as last resort

Args:
    paper_id: Semantic Scholar ID or prefixed ID (DOI:, ARXIV:, PMID:).
    save_path: Directory to save PDF.

Returns:
    Full paper text in Markdown format.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
paper_idYes
save_pathNo

Implementation Reference

  • MCP tool handler for 'read_semantic_paper'. This is the primary entry point decorated with @mcp.tool(). It delegates execution to the generic _read helper using the 'semantic' searcher instance.
    @mcp.tool()
    async def read_semantic_paper(paper_id: str, save_path: Optional[str] = None) -> str:
        """Read paper via Semantic Scholar (open-access only, use as LAST RESORT).
        
        DOWNLOAD PRIORITY (try in order):
        1. If arXiv paper -> use read_arxiv_paper(arxiv_id)
        2. If published before 2023 -> use read_scihub_paper(doi)
        3. Use this tool as last resort
        
        Args:
            paper_id: Semantic Scholar ID or prefixed ID (DOI:, ARXIV:, PMID:).
            save_path: Directory to save PDF.
        
        Returns:
            Full paper text in Markdown format.
        """
        return await _read('semantic', paper_id, save_path)
  • Generic helper function _read that retrieves the searcher instance from SEARCHERS dict and invokes searcher.read_paper(paper_id, save_path). Used by all platform-specific read_*_paper tools.
    async def _read(
        searcher_name: str, 
        paper_id: str, 
        save_path: Optional[str] = None
    ) -> str:
        """通用阅读函数"""
        if save_path is None:
            save_path = get_download_path()
        
        searcher = SEARCHERS.get(searcher_name)
        if not searcher:
            return f"Error: Unknown searcher {searcher_name}"
        
        try:
            return searcher.read_paper(paper_id, save_path)
        except NotImplementedError as e:
            return str(e)
        except Exception as e:
            logger.error(f"Read failed for {searcher_name}: {e}")
            return f"Error reading paper: {str(e)}"
  • Core implementation logic in SemanticSearcher.read_paper(). Downloads open-access PDF using get_paper_details() and pdf_url, then extracts full text to Markdown using pymupdf4llm.to_markdown(), prepends metadata.
    def read_paper(self, paper_id: str, save_path: str) -> str:
        """下载并提取论文文本
        
        使用 PyMuPDF4LLM 提取 Markdown 格式。
        
        Args:
            paper_id: 论文 ID
            save_path: 保存目录
            
        Returns:
            提取的文本内容或错误信息
        """
        # 先下载 PDF
        pdf_path = self.download_pdf(paper_id, save_path)
        if pdf_path.startswith("Error"):
            return pdf_path
        
        # 获取论文元数据
        paper = self.get_paper_details(paper_id)
        
        try:
            text = pymupdf4llm.to_markdown(pdf_path, show_progress=False)
            logger.info(f"Extracted {len(text)} characters using PyMuPDF4LLM")
            
            if not text.strip():
                return f"PDF downloaded to {pdf_path}, but no text could be extracted."
            
            # 添加元数据
            metadata = ""
            if paper:
                metadata = f"# {paper.title}\n\n"
                metadata += f"**Authors**: {', '.join(paper.authors)}\n"
                metadata += f"**Published**: {paper.published_date}\n"
                metadata += f"**URL**: {paper.url}\n"
                metadata += f"**PDF**: {pdf_path}\n\n"
                metadata += "---\n\n"
            
            return metadata + text
            
        except Exception as e:
            logger.error(f"Failed to extract text: {e}")
            return f"Error extracting text: {e}"
  • Global SEARCHERS dictionary where 'semantic': SemanticSearcher() instance is registered, used by _read and _download helpers to dispatch to platform-specific implementations.
    SEARCHERS = {
        'arxiv': ArxivSearcher(),
        'pubmed': PubMedSearcher(),
        'biorxiv': BioRxivSearcher(),
        'medrxiv': MedRxivSearcher(),
        'google_scholar': GoogleScholarSearcher(),
        'iacr': IACRSearcher(),
        'semantic': SemanticSearcher(),
        'crossref': CrossRefSearcher(),
        'repec': RePECSearcher(),
    }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/h-lu/paper-search-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server