Skip to main content
Glama

arXiv CLI & MCP Server

A Python toolkit for searching and downloading papers from arXiv.org, with both a command-line interface and a Model Context Protocol (MCP) server for LLM integration.

CLI agents work well with well-documented CLI tools and/or MCP servers. This project provides both options.

Features

  • Search arXiv papers by title, author, abstract, category, and more

  • Download PDFs automatically with local caching

  • MCP Server for integration with LLM assistants (Claude Desktop, etc.)

  • Typed responses using Pydantic models for clean data handling

  • Rate limiting built-in to respect arXiv API guidelines

  • Comprehensive tests with 26 integration tests (no mocking)

Installation

Option 1: Install from GitHub (Recommended)

Install directly from the GitHub repository:

# Install the latest version uv pip install git+https://github.com/LiamConnell/arxiv_for_agents.git # Or with pip pip install git+https://github.com/LiamConnell/arxiv_for_agents.git # Now you can use the arxiv command arxiv --help

Option 2: Install from Source

Clone the repository and install locally:

# Clone the repository git clone https://github.com/LiamConnell/arxiv_for_agents.git cd arxiv_for_agents # Install in editable mode uv pip install -e . # Now you can use the arxiv command arxiv --help

Option 3: Development Installation

For development with all dependencies:

# Clone and install with dev dependencies git clone https://github.com/LiamConnell/arxiv_for_agents.git cd arxiv_for_agents uv pip install -e ".[dev]" # Run tests uv run pytest

Verify Installation

# If installed as package arxiv --help # Or if using as module uv run python -m arxiv --help

Usage

Note: If you installed as a package, use arxiv directly. Otherwise, use uv run python -m arxiv.

Search Papers

Search by title:

# Using installed package arxiv search "ti:attention is all you need" # Or using as module uv run python -m arxiv search "ti:attention is all you need"

Search by author:

arxiv search "au:Hinton" --max-results 20

Search by category:

arxiv search "cat:cs.AI" --max-results 10

Combined search:

arxiv search "ti:transformer AND au:Vaswani"

Get Specific Paper

Get paper metadata and download PDF:

arxiv get 1706.03762

Get metadata only (no download):

arxiv get 1706.03762 --no-download

Force re-download:

arxiv get 1706.03762 --force

Download PDF

Download just the PDF:

arxiv download 1706.03762

List Downloaded PDFs

arxiv list-downloads

JSON Output

Get results as JSON for scripting:

arxiv search "ti:neural" --json arxiv get 1706.03762 --json --no-download

Search Query Syntax

The arXiv API supports field-specific searches:

  • ti: - Title

  • au: - Author

  • abs: - Abstract

  • cat: - Category (e.g., cs.AI, cs.LG)

  • all: - All fields (default)

You can combine searches with AND, OR, and ANDNOT:

arxiv search "ti:neural AND cat:cs.LG" arxiv search "au:Hinton OR au:Bengio"

Download Directory

PDFs are downloaded to ./.arxiv by default. Change this with:

arxiv --download-dir ./papers search "ti:transformer"

MCP Server (Model Context Protocol)

The arXiv CLI includes a Model Context Protocol (MCP) server that allows LLM assistants (like Claude Desktop) to search and download arXiv papers programmatically.

Running the MCP Server

# Option 1: Using the script entry point (recommended) uv run arxiv-mcp # Option 2: Using the module uv run python -m arxiv.mcp

The server runs in stdio mode and communicates via JSON-RPC over stdin/stdout.

MCP Tools

The server provides 4 tools for paper discovery and management:

  1. search_papers - Search arXiv with advanced query syntax

    • Supports field prefixes (ti:, au:, abs:, cat:)

    • Boolean operators (AND, OR, ANDNOT)

    • Pagination and sorting options

    • Returns paper metadata including title, authors, abstract, categories

  2. get_paper - Get detailed information about a specific paper

    • Accepts flexible ID formats (1706.03762, arXiv:1706.03762, 1706.03762v1)

    • Optionally downloads PDF automatically

    • Returns complete metadata including DOI, journal references, comments

  3. download_paper - Download PDF for a specific paper

    • Downloads to local .arxiv directory

    • Returns file path and size information

    • Supports force re-download option

  4. list_downloaded_papers - List all locally downloaded PDFs

    • Shows arxiv IDs, file sizes, and paths

    • Useful for managing local paper collection

MCP Resources

The server exposes 2 resources for direct access:

  • paper://{arxiv_id} - Get formatted paper metadata in markdown

  • downloads://list - Get markdown table of all downloaded papers

MCP Prompts

Pre-built prompt templates to guide usage:

  • search_arxiv_prompt - Guide for searching arXiv papers

  • download_paper_prompt - Guide for downloading and managing papers

Claude Desktop Configuration

Add to your Claude Desktop config file (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):

If installed from GitHub/pip:

{ "mcpServers": { "arxiv": { "command": "arxiv-mcp" } } }

If running from source/development:

{ "mcpServers": { "arxiv": { "command": "uv", "args": ["run", "arxiv-mcp"], "cwd": "/path/to/arxiv_for_agents" } } }

Or use --directory to avoid needing cwd:

{ "mcpServers": { "arxiv": { "command": "uv", "args": ["--directory", "/path/to/arxiv_for_agents", "run", "arxiv-mcp"] } } }

MCP Use Cases

Once configured, you can ask Claude to:

  • "Search arXiv for recent papers on transformer architectures"

  • "Find papers by Geoffrey Hinton in the cs.AI category"

  • "Download the 'Attention is All You Need' paper"

  • "Show me papers about neural networks from 2023"

  • "List all the papers I've downloaded"

  • "Get the abstract for arXiv:1706.03762"

The MCP integration allows Claude to autonomously search, retrieve, and manage academic papers from arXiv.

Architecture

Module Structure

arxiv/ ├── __init__.py # Package exports ├── __main__.py # CLI entry point ├── cli.py # Click commands ├── models.py # Pydantic models ├── services.py # API client service └── mcp/ # MCP server ├── __init__.py # MCP package exports ├── __main__.py # MCP server entry point └── server.py # FastMCP server with tools, resources, prompts tests/ └── test_services.py # Integration tests (26 tests)

Pydantic Models

All API responses are typed using Pydantic:

from arxiv import ArxivService service = ArxivService() result = service.search("ti:neural", max_results=5) # result is typed as ArxivSearchResult print(f"Total: {result.total_results}") for entry in result.entries: # entry is typed as ArxivEntry print(f"{entry.arxiv_id}: {entry.title}") print(f"Authors: {', '.join(a.name for a in entry.authors)}")

Key Models

  • ArxivSearchResult: Search results with metadata

    • total_results: Total matching papers

    • entries: List of ArxivEntry objects

  • ArxivEntry: Individual paper

    • arxiv_id: Clean ID (e.g., "1706.03762")

    • title, summary: Paper metadata

    • authors: List of Author objects

    • categories: Subject categories

    • pdf_url: Direct PDF link

    • published, updated: Datetime objects

  • Author: Paper author

    • name: Author name

    • affiliation: Optional affiliation

Testing

Run all 26 integration tests (makes real API calls):

uv run pytest tests/test_services.py -v

Run specific test class:

uv run pytest tests/test_services.py::TestArxivServiceSearch -v

The tests are integration tests that hit the real arXiv API, ensuring the service works with actual data.

API Rate Limiting

The service enforces a 3-second delay between API requests by default (arXiv's recommendation). You can adjust this:

from arxiv import ArxivService service = ArxivService(rate_limit_delay=5.0) # 5 seconds

Examples

Python API

from arxiv import ArxivService # Initialize service service = ArxivService(download_dir="./papers") # Search results = service.search( query="ti:attention is all you need", max_results=5, sort_by="relevance" ) print(f"Found {results.total_results} papers") for entry in results.entries: print(f"- {entry.title}") # Get specific paper entry = service.get("1706.03762", download_pdf=True) print(f"Downloaded: {entry.title}") # Just download PDF pdf_path = service.download_pdf("1706.03762") print(f"PDF saved to: {pdf_path}")

CLI Examples

# Find recent papers in a category arxiv search "cat:cs.AI" \ --max-results 10 \ --sort-by submittedDate \ --sort-order descending # Search and output as JSON for processing arxiv search "ti:transformer" --json | jq '.entries[].title' # Batch download multiple papers for id in 1706.03762 1810.04805 2010.11929; do arxiv download $id done

Development

The codebase follows these principles:

  1. Type safety: Pydantic models for all API responses

  2. Clean architecture: Separation of CLI, service, and models

  3. Real tests: Integration tests with actual API calls (no mocks)

  4. Rate limiting: Respects arXiv API guidelines

  5. Caching: Automatic local caching to avoid re-downloads

arXiv API Reference

  • Base URL: https://export.arxiv.org/api/query

  • Format: Atom XML

  • Rate limit: 3 seconds between requests (recommended)

  • Documentation: https://info.arxiv.org/help/api/user-manual.html

License

This is a personal project for interacting with arXiv's public API.

-
security - not tested
F
license - not found
-
quality - not tested

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/LiamConnell/arxiv_for_agents'

If you have feedback or need assistance with the MCP directory API, please join our Discord server