Provides tools for searching arXiv papers by title, author, abstract, and category, downloading PDFs with local caching, and managing a local collection of academic papers from the arXiv repository.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@arXiv MCP Serversearch for recent papers about large language models in cs.AI"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
arXiv CLI & MCP Server
A Python toolkit for searching and downloading papers from arXiv.org, with both a command-line interface and a Model Context Protocol (MCP) server for LLM integration.
CLI agents work well with well-documented CLI tools and/or MCP servers. This project provides both options.
Features
Search arXiv papers by title, author, abstract, category, and more
Download PDFs automatically with local caching
MCP Server for integration with LLM assistants (Claude Desktop, etc.)
Typed responses using Pydantic models for clean data handling
Rate limiting built-in to respect arXiv API guidelines
Comprehensive tests with 26 integration tests (no mocking)
Installation
Option 1: Install from GitHub (Recommended)
Install directly from the GitHub repository:
# Install the latest version
uv pip install git+https://github.com/LiamConnell/arxiv_for_agents.git
# Or with pip
pip install git+https://github.com/LiamConnell/arxiv_for_agents.git
# Now you can use the arxiv command
arxiv --helpOption 2: Install from Source
Clone the repository and install locally:
# Clone the repository
git clone https://github.com/LiamConnell/arxiv_for_agents.git
cd arxiv_for_agents
# Install in editable mode
uv pip install -e .
# Now you can use the arxiv command
arxiv --helpOption 3: Development Installation
For development with all dependencies:
# Clone and install with dev dependencies
git clone https://github.com/LiamConnell/arxiv_for_agents.git
cd arxiv_for_agents
uv pip install -e ".[dev]"
# Run tests
uv run pytestVerify Installation
# If installed as package
arxiv --help
# Or if using as module
uv run python -m arxiv --helpUsage
Note: If you installed as a package, use arxiv directly. Otherwise, use uv run python -m arxiv.
Search Papers
Search by title:
# Using installed package
arxiv search "ti:attention is all you need"
# Or using as module
uv run python -m arxiv search "ti:attention is all you need"Search by author:
arxiv search "au:Hinton" --max-results 20Search by category:
arxiv search "cat:cs.AI" --max-results 10Combined search:
arxiv search "ti:transformer AND au:Vaswani"Get Specific Paper
Get paper metadata and download PDF:
arxiv get 1706.03762Get metadata only (no download):
arxiv get 1706.03762 --no-downloadForce re-download:
arxiv get 1706.03762 --forceDownload PDF
Download just the PDF:
arxiv download 1706.03762List Downloaded PDFs
arxiv list-downloadsJSON Output
Get results as JSON for scripting:
arxiv search "ti:neural" --json
arxiv get 1706.03762 --json --no-downloadSearch Query Syntax
The arXiv API supports field-specific searches:
ti:- Titleau:- Authorabs:- Abstractcat:- Category (e.g., cs.AI, cs.LG)all:- All fields (default)
You can combine searches with AND, OR, and ANDNOT:
arxiv search "ti:neural AND cat:cs.LG"
arxiv search "au:Hinton OR au:Bengio"Download Directory
PDFs are downloaded to ./.arxiv by default. Change this with:
arxiv --download-dir ./papers search "ti:transformer"MCP Server (Model Context Protocol)
The arXiv CLI includes a Model Context Protocol (MCP) server that allows LLM assistants (like Claude Desktop) to search and download arXiv papers programmatically.
Running the MCP Server
# Option 1: Using the script entry point (recommended)
uv run arxiv-mcp
# Option 2: Using the module
uv run python -m arxiv.mcpThe server runs in stdio mode and communicates via JSON-RPC over stdin/stdout.
MCP Tools
The server provides 4 tools for paper discovery and management:
search_papers - Search arXiv with advanced query syntax
Supports field prefixes (ti:, au:, abs:, cat:)
Boolean operators (AND, OR, ANDNOT)
Pagination and sorting options
Returns paper metadata including title, authors, abstract, categories
get_paper - Get detailed information about a specific paper
Accepts flexible ID formats (1706.03762, arXiv:1706.03762, 1706.03762v1)
Optionally downloads PDF automatically
Returns complete metadata including DOI, journal references, comments
download_paper - Download PDF for a specific paper
Downloads to local
.arxivdirectoryReturns file path and size information
Supports force re-download option
list_downloaded_papers - List all locally downloaded PDFs
Shows arxiv IDs, file sizes, and paths
Useful for managing local paper collection
MCP Resources
The server exposes 2 resources for direct access:
paper://{arxiv_id} - Get formatted paper metadata in markdown
downloads://list - Get markdown table of all downloaded papers
MCP Prompts
Pre-built prompt templates to guide usage:
search_arxiv_prompt - Guide for searching arXiv papers
download_paper_prompt - Guide for downloading and managing papers
Claude Desktop Configuration
Add to your Claude Desktop config file (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):
If installed from GitHub/pip:
{
"mcpServers": {
"arxiv": {
"command": "arxiv-mcp"
}
}
}If running from source/development:
{
"mcpServers": {
"arxiv": {
"command": "uv",
"args": ["run", "arxiv-mcp"],
"cwd": "/path/to/arxiv_for_agents"
}
}
}Or use --directory to avoid needing cwd:
{
"mcpServers": {
"arxiv": {
"command": "uv",
"args": ["--directory", "/path/to/arxiv_for_agents", "run", "arxiv-mcp"]
}
}
}MCP Use Cases
Once configured, you can ask Claude to:
"Search arXiv for recent papers on transformer architectures"
"Find papers by Geoffrey Hinton in the cs.AI category"
"Download the 'Attention is All You Need' paper"
"Show me papers about neural networks from 2023"
"List all the papers I've downloaded"
"Get the abstract for arXiv:1706.03762"
The MCP integration allows Claude to autonomously search, retrieve, and manage academic papers from arXiv.
Architecture
Module Structure
arxiv/
├── __init__.py # Package exports
├── __main__.py # CLI entry point
├── cli.py # Click commands
├── models.py # Pydantic models
├── services.py # API client service
└── mcp/ # MCP server
├── __init__.py # MCP package exports
├── __main__.py # MCP server entry point
└── server.py # FastMCP server with tools, resources, prompts
tests/
└── test_services.py # Integration tests (26 tests)Pydantic Models
All API responses are typed using Pydantic:
from arxiv import ArxivService
service = ArxivService()
result = service.search("ti:neural", max_results=5)
# result is typed as ArxivSearchResult
print(f"Total: {result.total_results}")
for entry in result.entries:
# entry is typed as ArxivEntry
print(f"{entry.arxiv_id}: {entry.title}")
print(f"Authors: {', '.join(a.name for a in entry.authors)}")Key Models
ArxivSearchResult: Search results with metadata
total_results: Total matching papersentries: List of ArxivEntry objects
ArxivEntry: Individual paper
arxiv_id: Clean ID (e.g., "1706.03762")title,summary: Paper metadataauthors: List of Author objectscategories: Subject categoriespdf_url: Direct PDF linkpublished,updated: Datetime objects
Author: Paper author
name: Author nameaffiliation: Optional affiliation
Testing
Run all 26 integration tests (makes real API calls):
uv run pytest tests/test_services.py -vRun specific test class:
uv run pytest tests/test_services.py::TestArxivServiceSearch -vThe tests are integration tests that hit the real arXiv API, ensuring the service works with actual data.
API Rate Limiting
The service enforces a 3-second delay between API requests by default (arXiv's recommendation). You can adjust this:
from arxiv import ArxivService
service = ArxivService(rate_limit_delay=5.0) # 5 secondsExamples
Python API
from arxiv import ArxivService
# Initialize service
service = ArxivService(download_dir="./papers")
# Search
results = service.search(
query="ti:attention is all you need",
max_results=5,
sort_by="relevance"
)
print(f"Found {results.total_results} papers")
for entry in results.entries:
print(f"- {entry.title}")
# Get specific paper
entry = service.get("1706.03762", download_pdf=True)
print(f"Downloaded: {entry.title}")
# Just download PDF
pdf_path = service.download_pdf("1706.03762")
print(f"PDF saved to: {pdf_path}")CLI Examples
# Find recent papers in a category
arxiv search "cat:cs.AI" \
--max-results 10 \
--sort-by submittedDate \
--sort-order descending
# Search and output as JSON for processing
arxiv search "ti:transformer" --json | jq '.entries[].title'
# Batch download multiple papers
for id in 1706.03762 1810.04805 2010.11929; do
arxiv download $id
doneDevelopment
The codebase follows these principles:
Type safety: Pydantic models for all API responses
Clean architecture: Separation of CLI, service, and models
Real tests: Integration tests with actual API calls (no mocks)
Rate limiting: Respects arXiv API guidelines
Caching: Automatic local caching to avoid re-downloads
arXiv API Reference
Base URL: https://export.arxiv.org/api/query
Format: Atom XML
Rate limit: 3 seconds between requests (recommended)
Documentation: https://info.arxiv.org/help/api/user-manual.html
License
This is a personal project for interacting with arXiv's public API.