Provides tools for searching arXiv papers by title, author, abstract, and category, downloading PDFs with local caching, and managing a local collection of academic papers from the arXiv repository.
arXiv CLI & MCP Server
A Python toolkit for searching and downloading papers from arXiv.org, with both a command-line interface and a Model Context Protocol (MCP) server for LLM integration.
CLI agents work well with well-documented CLI tools and/or MCP servers. This project provides both options.
Features
Search arXiv papers by title, author, abstract, category, and more
Download PDFs automatically with local caching
MCP Server for integration with LLM assistants (Claude Desktop, etc.)
Typed responses using Pydantic models for clean data handling
Rate limiting built-in to respect arXiv API guidelines
Comprehensive tests with 26 integration tests (no mocking)
Installation
Option 1: Install from GitHub (Recommended)
Install directly from the GitHub repository:
Option 2: Install from Source
Clone the repository and install locally:
Option 3: Development Installation
For development with all dependencies:
Verify Installation
Usage
Note: If you installed as a package, use arxiv
directly. Otherwise, use uv run python -m arxiv
.
Search Papers
Search by title:
Search by author:
Search by category:
Combined search:
Get Specific Paper
Get paper metadata and download PDF:
Get metadata only (no download):
Force re-download:
Download PDF
Download just the PDF:
List Downloaded PDFs
JSON Output
Get results as JSON for scripting:
Search Query Syntax
The arXiv API supports field-specific searches:
ti:
- Titleau:
- Authorabs:
- Abstractcat:
- Category (e.g., cs.AI, cs.LG)all:
- All fields (default)
You can combine searches with AND
, OR
, and ANDNOT
:
Download Directory
PDFs are downloaded to ./.arxiv
by default. Change this with:
MCP Server (Model Context Protocol)
The arXiv CLI includes a Model Context Protocol (MCP) server that allows LLM assistants (like Claude Desktop) to search and download arXiv papers programmatically.
Running the MCP Server
The server runs in stdio mode and communicates via JSON-RPC over stdin/stdout.
MCP Tools
The server provides 4 tools for paper discovery and management:
search_papers - Search arXiv with advanced query syntax
Supports field prefixes (ti:, au:, abs:, cat:)
Boolean operators (AND, OR, ANDNOT)
Pagination and sorting options
Returns paper metadata including title, authors, abstract, categories
get_paper - Get detailed information about a specific paper
Accepts flexible ID formats (1706.03762, arXiv:1706.03762, 1706.03762v1)
Optionally downloads PDF automatically
Returns complete metadata including DOI, journal references, comments
download_paper - Download PDF for a specific paper
Downloads to local
.arxiv
directoryReturns file path and size information
Supports force re-download option
list_downloaded_papers - List all locally downloaded PDFs
Shows arxiv IDs, file sizes, and paths
Useful for managing local paper collection
MCP Resources
The server exposes 2 resources for direct access:
paper://{arxiv_id} - Get formatted paper metadata in markdown
downloads://list - Get markdown table of all downloaded papers
MCP Prompts
Pre-built prompt templates to guide usage:
search_arxiv_prompt - Guide for searching arXiv papers
download_paper_prompt - Guide for downloading and managing papers
Claude Desktop Configuration
Add to your Claude Desktop config file (~/Library/Application Support/Claude/claude_desktop_config.json
on macOS):
If installed from GitHub/pip:
If running from source/development:
Or use --directory
to avoid needing cwd
:
MCP Use Cases
Once configured, you can ask Claude to:
"Search arXiv for recent papers on transformer architectures"
"Find papers by Geoffrey Hinton in the cs.AI category"
"Download the 'Attention is All You Need' paper"
"Show me papers about neural networks from 2023"
"List all the papers I've downloaded"
"Get the abstract for arXiv:1706.03762"
The MCP integration allows Claude to autonomously search, retrieve, and manage academic papers from arXiv.
Architecture
Module Structure
Pydantic Models
All API responses are typed using Pydantic:
Key Models
ArxivSearchResult: Search results with metadata
total_results
: Total matching papersentries
: List of ArxivEntry objects
ArxivEntry: Individual paper
arxiv_id
: Clean ID (e.g., "1706.03762")title
,summary
: Paper metadataauthors
: List of Author objectscategories
: Subject categoriespdf_url
: Direct PDF linkpublished
,updated
: Datetime objects
Author: Paper author
name
: Author nameaffiliation
: Optional affiliation
Testing
Run all 26 integration tests (makes real API calls):
Run specific test class:
The tests are integration tests that hit the real arXiv API, ensuring the service works with actual data.
API Rate Limiting
The service enforces a 3-second delay between API requests by default (arXiv's recommendation). You can adjust this:
Examples
Python API
CLI Examples
Development
The codebase follows these principles:
Type safety: Pydantic models for all API responses
Clean architecture: Separation of CLI, service, and models
Real tests: Integration tests with actual API calls (no mocks)
Rate limiting: Respects arXiv API guidelines
Caching: Automatic local caching to avoid re-downloads
arXiv API Reference
Base URL: https://export.arxiv.org/api/query
Format: Atom XML
Rate limit: 3 seconds between requests (recommended)
Documentation: https://info.arxiv.org/help/api/user-manual.html
License
This is a personal project for interacting with arXiv's public API.
This server cannot be installed
hybrid server
The server is able to function both locally and remotely, depending on the configuration or use case.
Enables searching, downloading, and managing academic papers from arXiv.org through natural language interactions. Provides tools for paper discovery, PDF downloads, and local paper collection management.