Skip to main content
Glama

search_scihub

Search and download academic papers from Sci-Hub using DOI or URL. Automatically detects available mirrors and retrieves PDF files when needed.

Instructions

Search and download papers from Sci-Hub using DOI or paper URL. Automatically detects and uses the fastest available mirror.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
doiOrUrlYesDOI (e.g., "10.1038/nature12373") or full paper URL
downloadPdfNoWhether to download the PDF file
savePathNoDirectory to save the PDF file (if downloadPdf is true)

Implementation Reference

  • Executes the 'search_scihub' tool by calling the SciHub searcher's search method and optionally downloading the PDF.
    case 'search_scihub': { const { doiOrUrl, downloadPdf, savePath } = args; const resolvedSavePath = savePath || './downloads'; const results = await searchers.scihub.search(doiOrUrl); if (results.length === 0) { return jsonTextResponse(`No paper found on Sci-Hub for: ${doiOrUrl}`); } const paper = results[0]; let responseText = `Found paper on Sci-Hub:\n\n${JSON.stringify(PaperFactory.toDict(paper), null, 2)}`; if (downloadPdf && paper.pdfUrl) { try { const filePath = await searchers.scihub.downloadPdf(doiOrUrl, { savePath: resolvedSavePath }); responseText += `\n\nPDF downloaded successfully to: ${filePath}`; } catch (downloadError: any) { responseText += `\n\nFailed to download PDF: ${downloadError.message}`; } } return jsonTextResponse(responseText); }
  • Zod input schema for validating arguments to the 'search_scihub' tool.
    export const SearchSciHubSchema = z .object({ doiOrUrl: z.string().min(1), downloadPdf: z.boolean().optional().default(false), savePath: z.string().optional() }) .strip();
  • Registers the 'search_scihub' tool in the TOOLS array with metadata and JSON input schema.
    { name: 'search_scihub', description: 'Search and download papers from Sci-Hub using DOI or paper URL. Automatically detects and uses the fastest available mirror.', inputSchema: { type: 'object', properties: { doiOrUrl: { type: 'string', description: 'DOI (e.g., "10.1038/nature12373") or full paper URL' }, downloadPdf: { type: 'boolean', description: 'Whether to download the PDF file', default: false }, savePath: { type: 'string', description: 'Directory to save the PDF file (if downloadPdf is true)' } }, required: ['doiOrUrl'] } },
  • Core helper function in SciHubSearcher that fetches paper information from Sci-Hub mirrors using web scraping to extract PDF URLs and metadata.
    private async fetchPaperInfo(doiOrUrl: string): Promise<Paper | null> { let currentMirror = await this.getCurrentMirror(); let retries = 0; // 清理 DOI 格式 const cleanedQuery = doiOrUrl.replace(/^doi:\s*/i, ''); while (retries < this.maxRetries) { try { const searchUrl = `${currentMirror}/${cleanedQuery}`; logDebug(`Searching on ${currentMirror} for: ${cleanedQuery}`); const response = await this.axiosInstance.get(searchUrl); if (response.status === 200) { const $ = cheerio.load(response.data); // 检查是否找到论文 const pdfFrame = $('#pdf'); const pdfEmbed = $('embed[type="application/pdf"]'); const pdfIframe = $('iframe[src*=".pdf"]'); let pdfUrl = ''; // 尝试多种方式获取 PDF URL if (pdfFrame.length > 0) { pdfUrl = pdfFrame.attr('src') || ''; } else if (pdfEmbed.length > 0) { pdfUrl = pdfEmbed.attr('src') || ''; } else if (pdfIframe.length > 0) { pdfUrl = pdfIframe.attr('src') || ''; } else { // 查找下载按钮 const downloadButton = $('button[onclick*="download"]'); if (downloadButton.length > 0) { const onclickAttr = downloadButton.attr('onclick') || ''; const match = onclickAttr.match(/location\.href='([^']+)'/); if (match) { pdfUrl = match[1]; } } } // 处理相对 URL if (pdfUrl && !pdfUrl.startsWith('http')) { if (pdfUrl.startsWith('//')) { pdfUrl = 'https:' + pdfUrl; } else if (pdfUrl.startsWith('/')) { pdfUrl = currentMirror + pdfUrl; } } if (pdfUrl) { // 提取标题(尝试从页面标题或 citation 信息获取) let title = $('title').text(); const citation = $('#citation').text(); if (citation) { // 从引用信息中提取标题 const titleMatch = citation.match(/([^.]+)\./); if (titleMatch) { title = titleMatch[1].trim(); } } // 清理标题 title = title.replace(/\s*\|\s*Sci-Hub.*$/, '') .replace(/Sci-Hub\s*:\s*/, '') .trim(); return PaperFactory.create({ paperId: cleanedQuery, title: title || `Paper: ${cleanedQuery}`, source: 'scihub', authors: [], abstract: '', doi: this.isValidDOIOrURL(cleanedQuery) && cleanedQuery.includes('10.') ? cleanedQuery : '', publishedDate: null, pdfUrl: pdfUrl, url: searchUrl, extra: { mirror: currentMirror, fetchedAt: new Date().toISOString() } }); } else { logDebug(`Paper not found on ${currentMirror}`); currentMirror = await this.markMirrorFailed(currentMirror); retries++; } } else { logDebug(`Unexpected status ${response.status} from ${currentMirror}`); currentMirror = await this.markMirrorFailed(currentMirror); retries++; } } catch (error: any) { logDebug(`Error fetching from ${currentMirror}:`, error.message); currentMirror = await this.markMirrorFailed(currentMirror); retries++; } } return null; }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Dianel555/paper-search-mcp-nodejs'

If you have feedback or need assistance with the MCP directory API, please join our Discord server