arXiv CLI & MCP Server

A Python toolkit for searching and downloading papers from arXiv.org, with both a command-line interface and a Model Context Protocol (MCP) server for LLM integration.

CLI agents work well with well-documented CLI tools and/or MCP servers. This project provides both options.

Features

Search arXiv papers by title, author, abstract, category, and more
Download PDFs automatically with local caching
MCP Server for integration with LLM assistants (Claude Desktop, etc.)
Typed responses using Pydantic models for clean data handling
Rate limiting built-in to respect arXiv API guidelines
Comprehensive tests with 26 integration tests (no mocking)

Installation

Option 1: Install from GitHub (Recommended)

Install directly from the GitHub repository:

# Install the latest version uv pip install git+https://github.com/LiamConnell/arxiv_for_agents.git # Or with pip pip install git+https://github.com/LiamConnell/arxiv_for_agents.git # Now you can use the arxiv command arxiv --help

Option 2: Install from Source

Clone the repository and install locally:

# Clone the repository git clone https://github.com/LiamConnell/arxiv_for_agents.git cd arxiv_for_agents # Install in editable mode uv pip install -e . # Now you can use the arxiv command arxiv --help

Option 3: Development Installation

For development with all dependencies:

# Clone and install with dev dependencies git clone https://github.com/LiamConnell/arxiv_for_agents.git cd arxiv_for_agents uv pip install -e ".[dev]" # Run tests uv run pytest

Verify Installation

# If installed as package arxiv --help # Or if using as module uv run python -m arxiv --help

Usage

Note: If you installed as a package, use arxiv directly. Otherwise, use uv run python -m arxiv.

Search Papers

Search by title:

# Using installed package arxiv search "ti:attention is all you need" # Or using as module uv run python -m arxiv search "ti:attention is all you need"

Search by author:

arxiv search "au:Hinton" --max-results 20

Search by category:

arxiv search "cat:cs.AI" --max-results 10

Combined search:

arxiv search "ti:transformer AND au:Vaswani"

Get Specific Paper

Get paper metadata and download PDF:

arxiv get 1706.03762

Get metadata only (no download):

arxiv get 1706.03762 --no-download

Force re-download:

arxiv get 1706.03762 --force

Download PDF

Download just the PDF:

arxiv download 1706.03762

List Downloaded PDFs

arxiv list-downloads

JSON Output

Get results as JSON for scripting:

arxiv search "ti:neural" --json arxiv get 1706.03762 --json --no-download

Search Query Syntax

The arXiv API supports field-specific searches:

ti: - Title
au: - Author
abs: - Abstract
cat: - Category (e.g., cs.AI, cs.LG)
all: - All fields (default)

You can combine searches with AND, OR, and ANDNOT:

arxiv search "ti:neural AND cat:cs.LG" arxiv search "au:Hinton OR au:Bengio"

Download Directory

PDFs are downloaded to ./.arxiv by default. Change this with:

arxiv --download-dir ./papers search "ti:transformer"

MCP Server (Model Context Protocol)

The arXiv CLI includes a Model Context Protocol (MCP) server that allows LLM assistants (like Claude Desktop) to search and download arXiv papers programmatically.

Running the MCP Server

# Option 1: Using the script entry point (recommended) uv run arxiv-mcp # Option 2: Using the module uv run python -m arxiv.mcp

The server runs in stdio mode and communicates via JSON-RPC over stdin/stdout.

MCP Tools

The server provides 4 tools for paper discovery and management:

search_papers - Search arXiv with advanced query syntax
- Supports field prefixes (ti:, au:, abs:, cat:)
- Boolean operators (AND, OR, ANDNOT)
- Pagination and sorting options
- Returns paper metadata including title, authors, abstract, categories
get_paper - Get detailed information about a specific paper
- Accepts flexible ID formats (1706.03762, arXiv:1706.03762, 1706.03762v1)
- Optionally downloads PDF automatically
- Returns complete metadata including DOI, journal references, comments
download_paper - Download PDF for a specific paper
- Downloads to local .arxiv directory
- Returns file path and size information
- Supports force re-download option
list_downloaded_papers - List all locally downloaded PDFs
- Shows arxiv IDs, file sizes, and paths
- Useful for managing local paper collection

MCP Resources

The server exposes 2 resources for direct access:

paper://{arxiv_id} - Get formatted paper metadata in markdown
downloads://list - Get markdown table of all downloaded papers

MCP Prompts

Pre-built prompt templates to guide usage:

search_arxiv_prompt - Guide for searching arXiv papers
download_paper_prompt - Guide for downloading and managing papers

Claude Desktop Configuration

Add to your Claude Desktop config file (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):

If installed from GitHub/pip:

{ "mcpServers": { "arxiv": { "command": "arxiv-mcp" } } }

If running from source/development:

{ "mcpServers": { "arxiv": { "command": "uv", "args": ["run", "arxiv-mcp"], "cwd": "/path/to/arxiv_for_agents" } } }

Or use --directory to avoid needing cwd:

{ "mcpServers": { "arxiv": { "command": "uv", "args": ["--directory", "/path/to/arxiv_for_agents", "run", "arxiv-mcp"] } } }

MCP Use Cases

Once configured, you can ask Claude to:

"Search arXiv for recent papers on transformer architectures"
"Find papers by Geoffrey Hinton in the cs.AI category"
"Download the 'Attention is All You Need' paper"
"Show me papers about neural networks from 2023"
"List all the papers I've downloaded"
"Get the abstract for arXiv:1706.03762"

The MCP integration allows Claude to autonomously search, retrieve, and manage academic papers from arXiv.

Architecture

Module Structure

arxiv/ ├── __init__.py # Package exports ├── __main__.py # CLI entry point ├── cli.py # Click commands ├── models.py # Pydantic models ├── services.py # API client service └── mcp/ # MCP server ├── __init__.py # MCP package exports ├── __main__.py # MCP server entry point └── server.py # FastMCP server with tools, resources, prompts tests/ └── test_services.py # Integration tests (26 tests)

Pydantic Models

All API responses are typed using Pydantic:

from arxiv import ArxivService service = ArxivService() result = service.search("ti:neural", max_results=5) # result is typed as ArxivSearchResult print(f"Total: {result.total_results}") for entry in result.entries: # entry is typed as ArxivEntry print(f"{entry.arxiv_id}: {entry.title}") print(f"Authors: {', '.join(a.name for a in entry.authors)}")

Key Models

ArxivSearchResult: Search results with metadata
- total_results: Total matching papers
- entries: List of ArxivEntry objects
ArxivEntry: Individual paper
- arxiv_id: Clean ID (e.g., "1706.03762")
- title, summary: Paper metadata
- authors: List of Author objects
- categories: Subject categories
- pdf_url: Direct PDF link
- published, updated: Datetime objects
Author: Paper author
- name: Author name
- affiliation: Optional affiliation

Testing

Run all 26 integration tests (makes real API calls):

uv run pytest tests/test_services.py -v

Run specific test class:

uv run pytest tests/test_services.py::TestArxivServiceSearch -v

The tests are integration tests that hit the real arXiv API, ensuring the service works with actual data.

API Rate Limiting

The service enforces a 3-second delay between API requests by default (arXiv's recommendation). You can adjust this:

from arxiv import ArxivService service = ArxivService(rate_limit_delay=5.0) # 5 seconds

Examples

Python API

from arxiv import ArxivService # Initialize service service = ArxivService(download_dir="./papers") # Search results = service.search( query="ti:attention is all you need", max_results=5, sort_by="relevance" ) print(f"Found {results.total_results} papers") for entry in results.entries: print(f"- {entry.title}") # Get specific paper entry = service.get("1706.03762", download_pdf=True) print(f"Downloaded: {entry.title}") # Just download PDF pdf_path = service.download_pdf("1706.03762") print(f"PDF saved to: {pdf_path}")

CLI Examples

# Find recent papers in a category arxiv search "cat:cs.AI" \ --max-results 10 \ --sort-by submittedDate \ --sort-order descending # Search and output as JSON for processing arxiv search "ti:transformer" --json | jq '.entries[].title' # Batch download multiple papers for id in 1706.03762 1810.04805 2010.11929; do arxiv download $id done

Development

The codebase follows these principles:

Type safety: Pydantic models for all API responses
Clean architecture: Separation of CLI, service, and models
Real tests: Integration tests with actual API calls (no mocks)
Rate limiting: Respects arXiv API guidelines
Caching: Automatic local caching to avoid re-downloads

arXiv API Reference

Base URL: https://export.arxiv.org/api/query
Format: Atom XML
Rate limit: 3 seconds between requests (recommended)
Documentation: https://info.arxiv.org/help/api/user-manual.html

License

This is a personal project for interacting with arXiv's public API.

arXiv MCP Server