arXiv MCP Server

README.md•9.81 KiB

# arXiv CLI & MCP Server A Python toolkit for searching and downloading papers from arXiv.org, with both a command-line interface and a Model Context Protocol (MCP) server for LLM integration. CLI agents work well with well-documented CLI tools and/or MCP servers. This project provides both options. ## Features - **Search** arXiv papers by title, author, abstract, category, and more - **Download** PDFs automatically with local caching - **MCP Server** for integration with LLM assistants (Claude Desktop, etc.) - **Typed responses** using Pydantic models for clean data handling - **Rate limiting** built-in to respect arXiv API guidelines - **Comprehensive tests** with 26 integration tests (no mocking) ## Installation ### Option 1: Install from GitHub (Recommended) Install directly from the GitHub repository: ```bash # Install the latest version uv pip install git+https://github.com/LiamConnell/arxiv_for_agents.git # Or with pip pip install git+https://github.com/LiamConnell/arxiv_for_agents.git # Now you can use the arxiv command arxiv --help ``` ### Option 2: Install from Source Clone the repository and install locally: ```bash # Clone the repository git clone https://github.com/LiamConnell/arxiv_for_agents.git cd arxiv_for_agents # Install in editable mode uv pip install -e . # Now you can use the arxiv command arxiv --help ``` ### Option 3: Development Installation For development with all dependencies: ```bash # Clone and install with dev dependencies git clone https://github.com/LiamConnell/arxiv_for_agents.git cd arxiv_for_agents uv pip install -e ".[dev]" # Run tests uv run pytest ``` ### Verify Installation ```bash # If installed as package arxiv --help # Or if using as module uv run python -m arxiv --help ``` ## Usage **Note:** If you installed as a package, use `arxiv` directly. Otherwise, use `uv run python -m arxiv`. ### Search Papers Search by title: ```bash # Using installed package arxiv search "ti:attention is all you need" # Or using as module uv run python -m arxiv search "ti:attention is all you need" ``` Search by author: ```bash arxiv search "au:Hinton" --max-results 20 ``` Search by category: ```bash arxiv search "cat:cs.AI" --max-results 10 ``` Combined search: ```bash arxiv search "ti:transformer AND au:Vaswani" ``` ### Get Specific Paper Get paper metadata and download PDF: ```bash arxiv get 1706.03762 ``` Get metadata only (no download): ```bash arxiv get 1706.03762 --no-download ``` Force re-download: ```bash arxiv get 1706.03762 --force ``` ### Download PDF Download just the PDF: ```bash arxiv download 1706.03762 ``` ### List Downloaded PDFs ```bash arxiv list-downloads ``` ### JSON Output Get results as JSON for scripting: ```bash arxiv search "ti:neural" --json arxiv get 1706.03762 --json --no-download ``` ## Search Query Syntax The arXiv API supports field-specific searches: - `ti:` - Title - `au:` - Author - `abs:` - Abstract - `cat:` - Category (e.g., cs.AI, cs.LG) - `all:` - All fields (default) You can combine searches with `AND`, `OR`, and `ANDNOT`: ```bash arxiv search "ti:neural AND cat:cs.LG" arxiv search "au:Hinton OR au:Bengio" ``` ## Download Directory PDFs are downloaded to `./.arxiv` by default. Change this with: ```bash arxiv --download-dir ./papers search "ti:transformer" ``` ## MCP Server (Model Context Protocol) The arXiv CLI includes a Model Context Protocol (MCP) server that allows LLM assistants (like Claude Desktop) to search and download arXiv papers programmatically. ### Running the MCP Server ```bash # Option 1: Using the script entry point (recommended) uv run arxiv-mcp # Option 2: Using the module uv run python -m arxiv.mcp ``` The server runs in stdio mode and communicates via JSON-RPC over stdin/stdout. ### MCP Tools The server provides 4 tools for paper discovery and management: 1. **search_papers** - Search arXiv with advanced query syntax - Supports field prefixes (ti:, au:, abs:, cat:) - Boolean operators (AND, OR, ANDNOT) - Pagination and sorting options - Returns paper metadata including title, authors, abstract, categories 2. **get_paper** - Get detailed information about a specific paper - Accepts flexible ID formats (1706.03762, arXiv:1706.03762, 1706.03762v1) - Optionally downloads PDF automatically - Returns complete metadata including DOI, journal references, comments 3. **download_paper** - Download PDF for a specific paper - Downloads to local `.arxiv` directory - Returns file path and size information - Supports force re-download option 4. **list_downloaded_papers** - List all locally downloaded PDFs - Shows arxiv IDs, file sizes, and paths - Useful for managing local paper collection ### MCP Resources The server exposes 2 resources for direct access: - **paper://{arxiv_id}** - Get formatted paper metadata in markdown - **downloads://list** - Get markdown table of all downloaded papers ### MCP Prompts Pre-built prompt templates to guide usage: - **search_arxiv_prompt** - Guide for searching arXiv papers - **download_paper_prompt** - Guide for downloading and managing papers ### Claude Desktop Configuration Add to your Claude Desktop config file (`~/Library/Application Support/Claude/claude_desktop_config.json` on macOS): **If installed from GitHub/pip:** ```json { "mcpServers": { "arxiv": { "command": "arxiv-mcp" } } } ``` **If running from source/development:** ```json { "mcpServers": { "arxiv": { "command": "uv", "args": ["run", "arxiv-mcp"], "cwd": "/path/to/arxiv_for_agents" } } } ``` Or use `--directory` to avoid needing `cwd`: ```json { "mcpServers": { "arxiv": { "command": "uv", "args": ["--directory", "/path/to/arxiv_for_agents", "run", "arxiv-mcp"] } } } ``` ### MCP Use Cases Once configured, you can ask Claude to: - "Search arXiv for recent papers on transformer architectures" - "Find papers by Geoffrey Hinton in the cs.AI category" - "Download the 'Attention is All You Need' paper" - "Show me papers about neural networks from 2023" - "List all the papers I've downloaded" - "Get the abstract for arXiv:1706.03762" The MCP integration allows Claude to autonomously search, retrieve, and manage academic papers from arXiv. ## Architecture ### Module Structure ``` arxiv/ ├── __init__.py # Package exports ├── __main__.py # CLI entry point ├── cli.py # Click commands ├── models.py # Pydantic models ├── services.py # API client service └── mcp/ # MCP server ├── __init__.py # MCP package exports ├── __main__.py # MCP server entry point └── server.py # FastMCP server with tools, resources, prompts tests/ └── test_services.py # Integration tests (26 tests) ``` ### Pydantic Models All API responses are typed using Pydantic: ```python from arxiv import ArxivService service = ArxivService() result = service.search("ti:neural", max_results=5) # result is typed as ArxivSearchResult print(f"Total: {result.total_results}") for entry in result.entries: # entry is typed as ArxivEntry print(f"{entry.arxiv_id}: {entry.title}") print(f"Authors: {', '.join(a.name for a in entry.authors)}") ``` ### Key Models - **ArxivSearchResult**: Search results with metadata - `total_results`: Total matching papers - `entries`: List of ArxivEntry objects - **ArxivEntry**: Individual paper - `arxiv_id`: Clean ID (e.g., "1706.03762") - `title`, `summary`: Paper metadata - `authors`: List of Author objects - `categories`: Subject categories - `pdf_url`: Direct PDF link - `published`, `updated`: Datetime objects - **Author**: Paper author - `name`: Author name - `affiliation`: Optional affiliation ## Testing Run all 26 integration tests (makes real API calls): ```bash uv run pytest tests/test_services.py -v ``` Run specific test class: ```bash uv run pytest tests/test_services.py::TestArxivServiceSearch -v ``` The tests are integration tests that hit the real arXiv API, ensuring the service works with actual data. ## API Rate Limiting The service enforces a 3-second delay between API requests by default (arXiv's recommendation). You can adjust this: ```python from arxiv import ArxivService service = ArxivService(rate_limit_delay=5.0) # 5 seconds ``` ## Examples ### Python API ```python from arxiv import ArxivService # Initialize service service = ArxivService(download_dir="./papers") # Search results = service.search( query="ti:attention is all you need", max_results=5, sort_by="relevance" ) print(f"Found {results.total_results} papers") for entry in results.entries: print(f"- {entry.title}") # Get specific paper entry = service.get("1706.03762", download_pdf=True) print(f"Downloaded: {entry.title}") # Just download PDF pdf_path = service.download_pdf("1706.03762") print(f"PDF saved to: {pdf_path}") ``` ### CLI Examples ```bash # Find recent papers in a category arxiv search "cat:cs.AI" \ --max-results 10 \ --sort-by submittedDate \ --sort-order descending # Search and output as JSON for processing arxiv search "ti:transformer" --json | jq '.entries[].title' # Batch download multiple papers for id in 1706.03762 1810.04805 2010.11929; do arxiv download $id done ``` ## Development The codebase follows these principles: 1. **Type safety**: Pydantic models for all API responses 2. **Clean architecture**: Separation of CLI, service, and models 3. **Real tests**: Integration tests with actual API calls (no mocks) 4. **Rate limiting**: Respects arXiv API guidelines 5. **Caching**: Automatic local caching to avoid re-downloads ## arXiv API Reference - Base URL: https://export.arxiv.org/api/query - Format: Atom XML - Rate limit: 3 seconds between requests (recommended) - Documentation: https://info.arxiv.org/help/api/user-manual.html ## License This is a personal project for interacting with arXiv's public API.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/LiamConnell/arxiv_for_agents'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

README.md•9.81 KiB