arXiv MCP Server

README.md•10 kB

# arXiv CLI & MCP Server A Python toolkit for searching and downloading papers from arXiv.org, with both a command-line interface and a Model Context Protocol (MCP) server for LLM integration. CLI agents work well with well-documented CLI tools and/or MCP servers. This project provides both options. ## Features - **Search** arXiv papers by title, author, abstract, category, and more - **Download** PDFs automatically with local caching - **MCP Server** for integration with LLM assistants (Claude Desktop, etc.) - **Typed responses** using Pydantic models for clean data handling - **Rate limiting** built-in to respect arXiv API guidelines - **Comprehensive tests** with 26 integration tests (no mocking) ## Installation ### Option 1: Install from GitHub (Recommended) Install directly from the GitHub repository: ```bash # Install the latest version uv pip install git+https://github.com/LiamConnell/arxiv_for_agents.git # Or with pip pip install git+https://github.com/LiamConnell/arxiv_for_agents.git # Now you can use the arxiv command arxiv --help ``` ### Option 2: Install from Source Clone the repository and install locally: ```bash # Clone the repository git clone https://github.com/LiamConnell/arxiv_for_agents.git cd arxiv_for_agents # Install in editable mode uv pip install -e . # Now you can use the arxiv command arxiv --help ``` ### Option 3: Development Installation For development with all dependencies: ```bash # Clone and install with dev dependencies git clone https://github.com/LiamConnell/arxiv_for_agents.git cd arxiv_for_agents uv pip install -e ".[dev]" # Run tests uv run pytest ``` ### Verify Installation ```bash # If installed as package arxiv --help # Or if using as module uv run python -m arxiv --help ``` ## Usage **Note:** If you installed as a package, use `arxiv` directly. Otherwise, use `uv run python -m arxiv`. ### Search Papers Search by title: ```bash # Using installed package arxiv search "ti:attention is all you need" # Or using as module uv run python -m arxiv search "ti:attention is all you need" ``` Search by author: ```bash arxiv search "au:Hinton" --max-results 20 ``` Search by category: ```bash arxiv search "cat:cs.AI" --max-results 10 ``` Combined search: ```bash arxiv search "ti:transformer AND au:Vaswani" ``` ### Get Specific Paper Get paper metadata and download PDF: ```bash arxiv get 1706.03762 ``` Get metadata only (no download): ```bash arxiv get 1706.03762 --no-download ``` Force re-download: ```bash arxiv get 1706.03762 --force ``` ### Download PDF Download just the PDF: ```bash arxiv download 1706.03762 ``` ### List Downloaded PDFs ```bash arxiv list-downloads ``` ### JSON Output Get results as JSON for scripting: ```bash arxiv search "ti:neural" --json arxiv get 1706.03762 --json --no-download ``` ## Search Query Syntax The arXiv API supports field-specific searches: - `ti:` - Title - `au:` - Author - `abs:` - Abstract - `cat:` - Category (e.g., cs.AI, cs.LG) - `all:` - All fields (default) You can combine searches with `AND`, `OR`, and `ANDNOT`: ```bash arxiv search "ti:neural AND cat:cs.LG" arxiv search "au:Hinton OR au:Bengio" ``` ## Download Directory PDFs are downloaded to `./.arxiv` by default. Change this with: ```bash arxiv --download-dir ./papers search "ti:transformer" ``` ## MCP Server (Model Context Protocol) The arXiv CLI includes a Model Context Protocol (MCP) server that allows LLM assistants (like Claude Desktop) to search and download arXiv papers programmatically. ### Running the MCP Server ```bash # Option 1: Using the script entry point (recommended) uv run arxiv-mcp # Option 2: Using the module uv run python -m arxiv.mcp ``` The server runs in stdio mode and communicates via JSON-RPC over stdin/stdout. ### MCP Tools The server provides 4 tools for paper discovery and management: 1. **search_papers** - Search arXiv with advanced query syntax - Supports field prefixes (ti:, au:, abs:, cat:) - Boolean operators (AND, OR, ANDNOT) - Pagination and sorting options - Returns paper metadata including title, authors, abstract, categories 2. **get_paper** - Get detailed information about a specific paper - Accepts flexible ID formats (1706.03762, arXiv:1706.03762, 1706.03762v1) - Optionally downloads PDF automatically - Returns complete metadata including DOI, journal references, comments 3. **download_paper** - Download PDF for a specific paper - Downloads to local `.arxiv` directory - Returns file path and size information - Supports force re-download option 4. **list_downloaded_papers** - List all locally downloaded PDFs - Shows arxiv IDs, file sizes, and paths - Useful for managing local paper collection ### MCP Resources The server exposes 2 resources for direct access: - **paper://{arxiv_id}** - Get formatted paper metadata in markdown - **downloads://list** - Get markdown table of all downloaded papers ### MCP Prompts Pre-built prompt templates to guide usage: - **search_arxiv_prompt** - Guide for searching arXiv papers - **download_paper_prompt** - Guide for downloading and managing papers ### Claude Desktop Configuration Add to your Claude Desktop config file (`~/Library/Application Support/Claude/claude_desktop_config.json` on macOS): **If installed from GitHub/pip:** ```json { "mcpServers": { "arxiv": { "command": "arxiv-mcp" } } } ``` **If running from source/development:** ```json { "mcpServers": { "arxiv": { "command": "uv", "args": ["run", "arxiv-mcp"], "cwd": "/path/to/arxiv_for_agents" } } } ``` Or use `--directory` to avoid needing `cwd`: ```json { "mcpServers": { "arxiv": { "command": "uv", "args": ["--directory", "/path/to/arxiv_for_agents", "run", "arxiv-mcp"] } } } ``` ### MCP Use Cases Once configured, you can ask Claude to: - "Search arXiv for recent papers on transformer architectures" - "Find papers by Geoffrey Hinton in the cs.AI category" - "Download the 'Attention is All You Need' paper" - "Show me papers about neural networks from 2023" - "List all the papers I've downloaded" - "Get the abstract for arXiv:1706.03762" The MCP integration allows Claude to autonomously search, retrieve, and manage academic papers from arXiv. ## Architecture ### Module Structure ``` arxiv/ ├── __init__.py # Package exports ├── __main__.py # CLI entry point ├── cli.py # Click commands ├── models.py # Pydantic models ├── services.py # API client service └── mcp/ # MCP server ├── __init__.py # MCP package exports ├── __main__.py # MCP server entry point └── server.py # FastMCP server with tools, resources, prompts tests/ └── test_services.py # Integration tests (26 tests) ``` ### Pydantic Models All API responses are typed using Pydantic: ```python from arxiv import ArxivService service = ArxivService() result = service.search("ti:neural", max_results=5) # result is typed as ArxivSearchResult print(f"Total: {result.total_results}") for entry in result.entries: # entry is typed as ArxivEntry print(f"{entry.arxiv_id}: {entry.title}") print(f"Authors: {', '.join(a.name for a in entry.authors)}") ``` ### Key Models - **ArxivSearchResult**: Search results with metadata - `total_results`: Total matching papers - `entries`: List of ArxivEntry objects - **ArxivEntry**: Individual paper - `arxiv_id`: Clean ID (e.g., "1706.03762") - `title`, `summary`: Paper metadata - `authors`: List of Author objects - `categories`: Subject categories - `pdf_url`: Direct PDF link - `published`, `updated`: Datetime objects - **Author**: Paper author - `name`: Author name - `affiliation`: Optional affiliation ## Testing Run all 26 integration tests (makes real API calls): ```bash uv run pytest tests/test_services.py -v ``` Run specific test class: ```bash uv run pytest tests/test_services.py::TestArxivServiceSearch -v ``` The tests are integration tests that hit the real arXiv API, ensuring the service works with actual data. ## API Rate Limiting The service enforces a 3-second delay between API requests by default (arXiv's recommendation). You can adjust this: ```python from arxiv import ArxivService service = ArxivService(rate_limit_delay=5.0) # 5 seconds ``` ## Examples ### Python API ```python from arxiv import ArxivService # Initialize service service = ArxivService(download_dir="./papers") # Search results = service.search( query="ti:attention is all you need", max_results=5, sort_by="relevance" ) print(f"Found {results.total_results} papers") for entry in results.entries: print(f"- {entry.title}") # Get specific paper entry = service.get("1706.03762", download_pdf=True) print(f"Downloaded: {entry.title}") # Just download PDF pdf_path = service.download_pdf("1706.03762") print(f"PDF saved to: {pdf_path}") ``` ### CLI Examples ```bash # Find recent papers in a category arxiv search "cat:cs.AI" \ --max-results 10 \ --sort-by submittedDate \ --sort-order descending # Search and output as JSON for processing arxiv search "ti:transformer" --json | jq '.entries[].title' # Batch download multiple papers for id in 1706.03762 1810.04805 2010.11929; do arxiv download $id done ``` ## Development The codebase follows these principles: 1. **Type safety**: Pydantic models for all API responses 2. **Clean architecture**: Separation of CLI, service, and models 3. **Real tests**: Integration tests with actual API calls (no mocks) 4. **Rate limiting**: Respects arXiv API guidelines 5. **Caching**: Automatic local caching to avoid re-downloads ## arXiv API Reference - Base URL: https://export.arxiv.org/api/query - Format: Atom XML - Rate limit: 3 seconds between requests (recommended) - Documentation: https://info.arxiv.org/help/api/user-manual.html ## License This is a personal project for interacting with arXiv's public API.

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/LiamConnell/arxiv_for_agents'

If you have feedback or need assistance with the MCP directory API, please join our Discord server