Skip to main content
Glama
Oxidane-bot

Paper Download MCP Server

by Oxidane-bot

Paper Download MCP Server

PyPI version Python Version License: MIT Downloads

MCP server for downloading academic papers from multiple sources with intelligent routing.

Note: This project is built on top of scihub-cli, adapting its core functionality for MCP integration. If you find this useful, consider starring both projects!

Features

  • Multi-Source Support: Downloads from multiple sources with automatic fallback

    • arXiv: Prioritized for preprints (free, no API key needed)

    • Unpaywall: For open access papers (requires email)

    • Direct PDF: Handles direct PDF URLs

    • PMC: PubMed Central articles

    • HTML Landing: Extracts PDF links from article pages

    • Sci-Hub: Fallback for older papers (coverage-driven)

    • CORE: Additional OA fallback

  • Intelligent Routing: Priority-based source selection with year-aware routing

  • 3 MCP Tools:

    • paper_download - Download single paper by DOI or URL

    • paper_batch_download - Download multiple papers with progress reporting

    • paper_metadata - Get paper metadata without downloading PDF

  • Clean Filenames: [YYYY] - Paper Title.pdf format

  • Rate Limiting: Built-in delays for API compliance

  • Comprehensive Error Messages: Actionable suggestions on failures

Installation

No manual installation required! Use uvx for automatic environment management:

{ "mcpServers": { "paper-download": { "command": "uvx", "args": ["paper-download-mcp"], "env": { "PAPER_DOWNLOAD_EMAIL": "your-email@university.edu" } } } }

For Developers

git clone <repository-url> cd paper-download-mcp uv sync uv run python -m paper_download_mcp.server

Configuration

Required Environment Variables

  • PAPER_DOWNLOAD_EMAIL: Your email address (required for Unpaywall API compliance)

    • Example: researcher@university.edu

    • Used for Unpaywall API tracking and contact purposes

Optional Environment Variables

  • PAPER_DOWNLOAD_OUTPUT_DIR: Default output directory (default: ./downloads)

Claude Desktop Setup

Add to your claude_desktop_config.json:

{ "mcpServers": { "paper-download": { "command": "uvx", "args": ["paper-download-mcp"], "env": { "PAPER_DOWNLOAD_EMAIL": "your-email@university.edu", "PAPER_DOWNLOAD_OUTPUT_DIR": "/path/to/papers" } } } }

After configuration, restart Claude Desktop.

Tools

paper_download

Download a single academic paper by DOI or URL.

Parameters:

  • identifier (required): DOI or URL (e.g., 10.1038/nature12373)

  • output_dir (optional): Output directory (default: ./downloads)

Example:

Download the paper 10.1038/nature12373

Returns:

  • Markdown with download details (file path, size, source, timing)

  • Error message with suggestions if download fails

paper_batch_download

Download multiple papers sequentially with progress reporting.

Parameters:

  • identifiers (required): List of DOIs or URLs (1-50 maximum)

  • output_dir (optional): Output directory (default: ./downloads)

Example:

Download these papers: 10.1038/nature12373, 10.1126/science.1234567

Returns:

  • Markdown summary with statistics

  • List of successful downloads

  • List of failed downloads with errors

Note: Downloads are sequential with 2-second delays for rate limiting.

paper_metadata

Retrieve paper metadata without downloading the PDF.

Parameters:

  • identifier (required): DOI or URL

Example:

Get metadata for 10.1038/nature12373

Returns:

  • JSON with paper details:

    • DOI, title, year, authors, journal

    • Open access status

    • Available download sources

How It Works

Intelligent Source Routing

The server uses priority routing to keep fast sources first:

  1. arXiv IDs/URLs: Try arXiv first, then OA sources if needed

  2. DOIs (year < 2021): OA sources first, Sci-Hub last

  3. DOIs (year ≥ 2021): OA sources only (Sci-Hub has no coverage)

  4. Year unknown: OA sources first, Sci-Hub last

Download Process

  1. Normalize DOI from URL/identifier

  2. Detect publication year via Crossref API (DOIs only)

  3. Route to appropriate source based on priority/year

  4. Download PDF with retry on failure

  5. Validate file (PDF header, size check)

  6. Generate filename: [YYYY] - Title.pdf

  7. Return absolute file path

Rate Limiting

  • 2-second delay between batch downloads

  • Respects Unpaywall API limits (~100k requests/day)

  • Built-in exponential backoff retry (3 attempts max)

Troubleshooting

"PAPER_DOWNLOAD_EMAIL environment variable is required"

Solution: Set the email in your Claude Desktop config (see Configuration section above).

"Paper not found in any source"

Possible causes:

  • Invalid or incorrect DOI

  • Paper too recent (not yet indexed)

  • Paper behind paywall with no open access version

  • Sci-Hub mirrors temporarily unavailable

Solutions:

  • Verify DOI on doi.org

  • Use paper_metadata to check availability

  • Try again later (mirrors may recover)

Download times out

Causes:

  • Slow network connection

  • Sci-Hub mirror selection taking too long

  • Large PDF file

Solutions:

  • Check internet connection

  • Retry (mirror selection is cached after first success)

  • Single papers typically complete in <15 seconds

Downloaded file is corrupted

The server validates PDFs before returning. If you encounter corruption:

  1. Check disk space

  2. Verify file permissions in output directory

  3. Try different paper (may be source issue)

Testing

MCP Inspector

Test the server with MCP Inspector:

export PAPER_DOWNLOAD_EMAIL=test@example.com npx @modelcontextprotocol/inspector uv run python -m paper_download_mcp.server

Unit Tests

uv run pytest

IMPORTANT: This tool provides access to academic papers through multiple sources:

  • Unpaywall (https://unpaywall.org): Legal open-access aggregator operated by OurResearch. Recommended and prioritized when available.

  • Sci-Hub: Operates in a legal gray area. While it provides access to research, it may violate copyright laws in some jurisdictions. Use at your own risk.

User Responsibilities:

  • You are responsible for compliance with applicable copyright laws in your jurisdiction

  • This tool is intended for research and educational purposes only

  • The maintainers assume no liability for how you use this tool

  • When possible, prefer legal open-access sources (Unpaywall)

By using this tool, you acknowledge these legal considerations and agree to use it responsibly.

Project Structure

paper-download-mcp/ ├── src/ │ └── paper_download_mcp/ │ ├── server.py # FastMCP entry point │ ├── models.py # Pydantic input schemas │ ├── formatters.py # Markdown/JSON formatters │ ├── tools/ │ │ ├── download.py # Download tools │ │ └── metadata.py # Metadata tool │ └── scihub_core/ # Copied from scihub-cli ├── pyproject.toml ├── README.md └── .gitignore

Architecture

Layer Architecture

  1. FastMCP Server Layer: Protocol handling, tool registration, config validation

  2. MCP Tools Layer: Request parsing, response formatting, async coordination

  3. Models & Formatters: Data validation, output serialization

  4. scihub_core Layer: Academic paper logic (unchanged from scihub-cli)

Async Pattern

All tools use asyncio.to_thread() to wrap synchronous scihub-cli code:

@mcp.tool() async def paper_download(...): def _sync_download(): # Synchronous scihub-cli code client = SciHubClient() return client.download_paper(doi) # Run in thread pool result = await asyncio.to_thread(_sync_download) return format_result(result)

This preserves the battle-tested scihub-cli code without modifications.

Performance

Operation

Target

Typical

Max

Get Metadata

<1s

0.5s

2s

Single Download

<5s

2-3s

10s

Batch (10 papers)

<40s

25-30s

60s

Note: First download may take longer (5-10s) due to mirror selection. Subsequent downloads use cached mirror.

Contributing

For Maintainers: Syncing from scihub-cli

The scihub_core/ directory contains code copied from the upstream scihub-cli project. When bugs are fixed or features added to scihub-cli:

Workflow:

  1. Fix/implement in scihub-cli project first

  2. Run tests and commit to scihub-cli

  3. Copy updated files to paper-download-mcp/src/paper_download_mcp/scihub_core/

  4. Test MCP server functionality

  5. Commit with message referencing upstream commit:

    sync: Update <file> from scihub-cli (<description>) Synced from scihub-cli commit <hash> <details of changes>

Last sync: scihub-cli@9787efc (2024-12-02) - Fixed year type bug in UnpaywallSource

License

MIT License - See LICENSE file for details

Credits

Support

For issues or questions:

  1. Check the Troubleshooting section above

  2. Open an issue on GitHub (include error messages and steps to reproduce)


Disclaimer: This tool is provided as-is for research and educational purposes. Users assume all responsibility for compliance with applicable laws and regulations.

-
security - not tested
A
license - permissive license
-
quality - not tested

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Oxidane-bot/paper-download-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server