Skip to main content
Glama
Dianel555

Paper Search MCP

by Dianel555

search_scihub

Search and download academic papers from Sci-Hub using DOI or URL. Automatically detects available mirrors and retrieves PDF files when needed.

Instructions

Search and download papers from Sci-Hub using DOI or paper URL. Automatically detects and uses the fastest available mirror.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
doiOrUrlYesDOI (e.g., "10.1038/nature12373") or full paper URL
downloadPdfNoWhether to download the PDF file
savePathNoDirectory to save the PDF file (if downloadPdf is true)

Implementation Reference

  • Executes the 'search_scihub' tool by calling the SciHub searcher's search method and optionally downloading the PDF.
    case 'search_scihub': {
      const { doiOrUrl, downloadPdf, savePath } = args;
      const resolvedSavePath = savePath || './downloads';
    
      const results = await searchers.scihub.search(doiOrUrl);
      if (results.length === 0) {
        return jsonTextResponse(`No paper found on Sci-Hub for: ${doiOrUrl}`);
      }
    
      const paper = results[0];
      let responseText = `Found paper on Sci-Hub:\n\n${JSON.stringify(PaperFactory.toDict(paper), null, 2)}`;
    
      if (downloadPdf && paper.pdfUrl) {
        try {
          const filePath = await searchers.scihub.downloadPdf(doiOrUrl, { savePath: resolvedSavePath });
          responseText += `\n\nPDF downloaded successfully to: ${filePath}`;
        } catch (downloadError: any) {
          responseText += `\n\nFailed to download PDF: ${downloadError.message}`;
        }
      }
    
      return jsonTextResponse(responseText);
    }
  • Zod input schema for validating arguments to the 'search_scihub' tool.
    export const SearchSciHubSchema = z
      .object({
        doiOrUrl: z.string().min(1),
        downloadPdf: z.boolean().optional().default(false),
        savePath: z.string().optional()
      })
      .strip();
  • Registers the 'search_scihub' tool in the TOOLS array with metadata and JSON input schema.
    {
      name: 'search_scihub',
      description:
        'Search and download papers from Sci-Hub using DOI or paper URL. Automatically detects and uses the fastest available mirror.',
      inputSchema: {
        type: 'object',
        properties: {
          doiOrUrl: {
            type: 'string',
            description: 'DOI (e.g., "10.1038/nature12373") or full paper URL'
          },
          downloadPdf: {
            type: 'boolean',
            description: 'Whether to download the PDF file',
            default: false
          },
          savePath: {
            type: 'string',
            description: 'Directory to save the PDF file (if downloadPdf is true)'
          }
        },
        required: ['doiOrUrl']
      }
    },
  • Core helper function in SciHubSearcher that fetches paper information from Sci-Hub mirrors using web scraping to extract PDF URLs and metadata.
    private async fetchPaperInfo(doiOrUrl: string): Promise<Paper | null> {
      let currentMirror = await this.getCurrentMirror();
      let retries = 0;
      
      // 清理 DOI 格式
      const cleanedQuery = doiOrUrl.replace(/^doi:\s*/i, '');
      
      while (retries < this.maxRetries) {
        try {
          const searchUrl = `${currentMirror}/${cleanedQuery}`;
          logDebug(`Searching on ${currentMirror} for: ${cleanedQuery}`);
          
          const response = await this.axiosInstance.get(searchUrl);
          
          if (response.status === 200) {
            const $ = cheerio.load(response.data);
            
            // 检查是否找到论文
            const pdfFrame = $('#pdf');
            const pdfEmbed = $('embed[type="application/pdf"]');
            const pdfIframe = $('iframe[src*=".pdf"]');
            
            let pdfUrl = '';
            
            // 尝试多种方式获取 PDF URL
            if (pdfFrame.length > 0) {
              pdfUrl = pdfFrame.attr('src') || '';
            } else if (pdfEmbed.length > 0) {
              pdfUrl = pdfEmbed.attr('src') || '';
            } else if (pdfIframe.length > 0) {
              pdfUrl = pdfIframe.attr('src') || '';
            } else {
              // 查找下载按钮
              const downloadButton = $('button[onclick*="download"]');
              if (downloadButton.length > 0) {
                const onclickAttr = downloadButton.attr('onclick') || '';
                const match = onclickAttr.match(/location\.href='([^']+)'/);
                if (match) {
                  pdfUrl = match[1];
                }
              }
            }
            
            // 处理相对 URL
            if (pdfUrl && !pdfUrl.startsWith('http')) {
              if (pdfUrl.startsWith('//')) {
                pdfUrl = 'https:' + pdfUrl;
              } else if (pdfUrl.startsWith('/')) {
                pdfUrl = currentMirror + pdfUrl;
              }
            }
            
            if (pdfUrl) {
              // 提取标题(尝试从页面标题或 citation 信息获取)
              let title = $('title').text();
              const citation = $('#citation').text();
              if (citation) {
                // 从引用信息中提取标题
                const titleMatch = citation.match(/([^.]+)\./);
                if (titleMatch) {
                  title = titleMatch[1].trim();
                }
              }
              
              // 清理标题
              title = title.replace(/\s*\|\s*Sci-Hub.*$/, '')
                          .replace(/Sci-Hub\s*:\s*/, '')
                          .trim();
              
              return PaperFactory.create({
                paperId: cleanedQuery,
                title: title || `Paper: ${cleanedQuery}`,
                source: 'scihub',
                authors: [],
                abstract: '',
                doi: this.isValidDOIOrURL(cleanedQuery) && cleanedQuery.includes('10.') 
                     ? cleanedQuery 
                     : '',
                publishedDate: null,
                pdfUrl: pdfUrl,
                url: searchUrl,
                extra: {
                  mirror: currentMirror,
                  fetchedAt: new Date().toISOString()
                }
              });
            } else {
              logDebug(`Paper not found on ${currentMirror}`);
              currentMirror = await this.markMirrorFailed(currentMirror);
              retries++;
            }
          } else {
            logDebug(`Unexpected status ${response.status} from ${currentMirror}`);
            currentMirror = await this.markMirrorFailed(currentMirror);
            retries++;
          }
        } catch (error: any) {
          logDebug(`Error fetching from ${currentMirror}:`, error.message);
          currentMirror = await this.markMirrorFailed(currentMirror);
          retries++;
        }
      }
      
      return null;
    }
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses key behavioral traits: automatic mirror detection and PDF download capability. However, it lacks details on error handling, rate limits, authentication needs, or what happens if download fails, leaving gaps for a mutation-capable tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core functionality ('Search and download papers from Sci-Hub') and adds useful context ('Automatically detects and uses the fastest available mirror') without any wasted words. Every part earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations and no output schema, the description is moderately complete for a tool with 3 parameters and mutation capability (download). It covers the purpose and basic behavior but lacks details on return values, error cases, or advanced usage, which would be helpful for full contextual understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema fully documents parameters. The description adds minimal value beyond the schema by implying the tool handles both DOI and URL formats, but does not provide additional syntax or format details. This meets the baseline of 3 for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Search and download papers') and resource ('from Sci-Hub'), distinguishing it from sibling tools like search_arxiv or search_pubmed by specifying the Sci-Hub source. It includes the unique capability of automatic mirror detection, which further differentiates it from generic search tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool: for searching Sci-Hub with DOI or URL inputs. It implicitly suggests alternatives like sibling tools for other sources (e.g., search_arxiv for arXiv), but does not explicitly state when not to use it or name specific alternatives, keeping it at a 4.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Dianel555/paper-search-mcp-nodejs'

If you have feedback or need assistance with the MCP directory API, please join our Discord server