Skip to main content
Glama
h-lu

Paper Search MCP Server

by h-lu

download_semantic

Download open-access PDFs from Semantic Scholar when other sources are unavailable. Use this tool as a last resort for academic paper retrieval.

Instructions

Download PDF via Semantic Scholar (open-access only, use as LAST RESORT).

DOWNLOAD PRIORITY (try in order):
1. If arXiv paper -> use download_arxiv(arxiv_id) (always works)
2. If published before 2023 -> use download_scihub(doi)
3. Use this tool as last resort (may not have PDF)

Args:
    paper_id: Semantic Scholar ID, or prefixed: 'DOI:xxx', 'ARXIV:xxx', 'PMID:xxx'
    save_path: Directory to save PDF.

Returns:
    Path to downloaded PDF, or error if not available.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
paper_idYes
save_pathNo

Implementation Reference

  • MCP tool handler for download_semantic. Registers the tool and delegates execution to SemanticSearcher's download_pdf method via the _download helper.
    @mcp.tool()
    async def download_semantic(paper_id: str, save_path: Optional[str] = None) -> str:
        """Download PDF via Semantic Scholar (open-access only, use as LAST RESORT).
        
        DOWNLOAD PRIORITY (try in order):
        1. If arXiv paper -> use download_arxiv(arxiv_id) (always works)
        2. If published before 2023 -> use download_scihub(doi)
        3. Use this tool as last resort (may not have PDF)
        
        Args:
            paper_id: Semantic Scholar ID, or prefixed: 'DOI:xxx', 'ARXIV:xxx', 'PMID:xxx'
            save_path: Directory to save PDF.
        
        Returns:
            Path to downloaded PDF, or error if not available.
        """
        return await _download('semantic', paper_id, save_path)
  • Core implementation of the PDF download logic. Fetches paper details using Semantic Scholar API, extracts open-access PDF URL, downloads and validates the PDF file, handles errors like non-PDF content or timeouts.
    def download_pdf(self, paper_id: str, save_path: str) -> str:
        """下载论文 PDF
        
        Args:
            paper_id: 论文 ID(支持多种格式)
            save_path: 保存目录
            
        Returns:
            下载的文件路径或错误信息
        """
        paper = self.get_paper_details(paper_id)
        if not paper:
            return f"Error: Could not find paper {paper_id}"
        
        if not paper.pdf_url:
            return f"Error: No PDF URL available for paper {paper_id}"
        
        pdf_url = paper.pdf_url
        logger.info(f"Downloading PDF from: {pdf_url}")
        
        try:
            # 直接使用 requests 下载
            pdf_response = requests.get(pdf_url, timeout=60)
            pdf_response.raise_for_status()
            
            # 验证下载的内容是 PDF
            content_type = pdf_response.headers.get('Content-Type', '')
            content = pdf_response.content
            
            # 检查是否是 PDF(通过内容头部)
            if not content.startswith(b'%PDF') and 'application/pdf' not in content_type:
                logger.warning(f"Downloaded content is not a PDF. Content-Type: {content_type}")
                # 如果是 HTML 页面(如 OSTI),尝试提取真实 PDF 链接
                if b'<html' in content[:1000].lower() or b'<!doctype' in content[:1000].lower():
                    logger.error("Downloaded HTML instead of PDF. The URL may require browser access.")
                    return f"Error: URL {pdf_url} returned HTML, not PDF. This may require direct browser download."
            
            # 准备保存路径
            os.makedirs(save_path, exist_ok=True)
            safe_id = paper_id.replace('/', '_').replace(':', '_')
            filename = f"semantic_{safe_id}.pdf"
            pdf_path = os.path.join(save_path, filename)
            
            with open(pdf_path, "wb") as f:
                f.write(content)
            
            # 最终验证
            file_size = os.path.getsize(pdf_path)
            if file_size < 1000:
                os.remove(pdf_path)
                return f"Error: Downloaded file too small ({file_size} bytes)"
            
            logger.info(f"PDF downloaded successfully: {pdf_path} ({file_size} bytes)")
            return pdf_path
            
        except requests.exceptions.Timeout:
            return f"Error: Download timed out for {pdf_url}"
        except requests.exceptions.RequestException as e:
            logger.error(f"PDF download error: {e}")
            return f"Error downloading PDF: {e}"
  • Generic helper function used by download_* tools to invoke the specific searcher's download_pdf method, with error handling and path resolution.
    async def _download(
        searcher_name: str, 
        paper_id: str, 
        save_path: Optional[str] = None
    ) -> str:
        """通用下载函数"""
        if save_path is None:
            save_path = get_download_path()
        
        searcher = SEARCHERS.get(searcher_name)
        if not searcher:
            return f"Error: Unknown searcher {searcher_name}"
        
        try:
            return searcher.download_pdf(paper_id, save_path)
        except NotImplementedError as e:
            return str(e)
        except Exception as e:
            logger.error(f"Download failed for {searcher_name}: {e}")
            return f"Error downloading: {str(e)}"
  • Instantiation of SemanticSearcher instance in the global SEARCHERS dictionary, used by _download to access the semantic platform implementation.
    SEARCHERS = {
        'arxiv': ArxivSearcher(),
        'pubmed': PubMedSearcher(),
        'biorxiv': BioRxivSearcher(),
        'medrxiv': MedRxivSearcher(),
        'google_scholar': GoogleScholarSearcher(),
        'iacr': IACRSearcher(),
        'semantic': SemanticSearcher(),
  • Type hints defining the input schema (paper_id: str, save_path: Optional[str]) and output (str: path or error). Detailed usage in docstring.
    async def download_semantic(paper_id: str, save_path: Optional[str] = None) -> str:
        """Download PDF via Semantic Scholar (open-access only, use as LAST RESORT).
        
        DOWNLOAD PRIORITY (try in order):
        1. If arXiv paper -> use download_arxiv(arxiv_id) (always works)
        2. If published before 2023 -> use download_scihub(doi)
        3. Use this tool as last resort (may not have PDF)
        
        Args:
            paper_id: Semantic Scholar ID, or prefixed: 'DOI:xxx', 'ARXIV:xxx', 'PMID:xxx'
            save_path: Directory to save PDF.
        
        Returns:
            Path to downloaded PDF, or error if not available.
        """
        return await _download('semantic', paper_id, save_path)

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/h-lu/paper-search-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server