arxiv-mcp-server
Provides tools to search arXiv papers, fetch metadata, read full-text content, and list category taxonomy, enabling AI agents to access and analyze academic papers from arXiv.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@arxiv-mcp-serversearch for the latest papers on large language models"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Public Hosted Server: https://arxiv.caseyjhand.com/mcp
Tools
Four tools for searching and reading arXiv papers:
Tool Name | Description |
| Search arXiv papers by query with category and sort filters. |
| Get full metadata for one or more arXiv papers by ID. |
| Fetch the full text content of an arXiv paper from its HTML rendering. |
| List arXiv category taxonomy, optionally filtered by group. |
arxiv_search
Search for papers using free-text queries with field prefixes and boolean operators.
Field prefixes:
ti:(title),au:(author),abs:(abstract),cat:(category),all:(all fields)Boolean operators:
AND,OR,ANDNOTOptional category filter, sorting (relevance, submitted, updated), and pagination
Returns up to 50 results per request with full metadata including abstract
arxiv_get_metadata
Fetch full metadata for one or more papers by known arXiv ID.
Batch fetch up to 10 papers in a single request
Accepts both versioned (
2401.12345v2) and unversioned (2401.12345) IDsLegacy ID format supported (
hep-th/9901001)Reports not-found IDs separately from found papers
arxiv_read_paper
Read the full HTML content of an arXiv paper.
Tries native arXiv HTML first, falls back to ar5iv for broader coverage
Strips HTML head/boilerplate and collapses MathML to dollar-delimited LaTeX (
$…$inline,$$…$$block) so the character budget targets paper contentmax_charactersdefaults to 100,000; raw HTML can be 500KB-3MB+ for math-heavy papersReturns raw HTML — no parsing or extraction; the LLM interprets content directly
arxiv_list_categories
List arXiv category codes and names for discovery.
~155 categories across 8 top-level groups (cs, math, physics, q-bio, q-fin, stat, eess, econ)
Optional group filter to narrow results
Static data — always succeeds
Resources
URI Pattern | Description |
| Paper metadata by arXiv ID. |
| Full arXiv category taxonomy. |
Features
Built on @cyanheads/mcp-ts-core:
Declarative tool definitions — single file per tool, framework handles registration and validation
Unified error handling across all tools
Pluggable auth (
none,jwt,oauth)Structured logging with optional OpenTelemetry tracing
Runs locally (stdio/HTTP) from the same codebase
arXiv-specific:
Read-only, no authentication required — arXiv API is free, metadata is CC0
Rate-limited request queue enforcing arXiv's 3-second crawl delay
Adaptive cooldown on rate-limit (5s → 10s → 20s → 30s), honors
Retry-AfterRetry with exponential backoff for transient failures
HTML content fallback chain: native arXiv HTML → ar5iv
Full arXiv category taxonomy embedded as static data
Optional local OAI-PMH metadata mirror (SQLite + FTS5) — opt-in, eliminates rate-limit exposure for
arxiv_searchandarxiv_get_metadata. See Optional: Local Mirror.
Getting Started
Public Hosted Instance
A public instance is available at https://arxiv.caseyjhand.com/mcp — no installation required. Point any MCP client at it via Streamable HTTP:
{
"mcpServers": {
"arxiv": {
"type": "streamable-http",
"url": "https://arxiv.caseyjhand.com/mcp"
}
}
}Self-Hosted / Local
Add to your MCP client config (e.g., claude_desktop_config.json):
{
"mcpServers": {
"arxiv": {
"type": "stdio",
"command": "bunx",
"args": ["@cyanheads/arxiv-mcp-server@latest"]
}
}
}Prerequisites
Bun v1.3.0 or higher.
Installation
Clone the repository:
git clone https://github.com/cyanheads/arxiv-mcp-server.gitNavigate into the directory:
cd arxiv-mcp-serverInstall dependencies:
bun installConfiguration
All configuration is optional — the server works out of the box with sensible defaults.
Variable | Description | Default |
| arXiv API base URL. |
|
| Minimum delay between arXiv API requests (ms). |
|
| Timeout for HTML content fetches (ms). |
|
| Timeout for API search/metadata requests (ms). |
|
| Enable local OAI-PMH metadata mirror for search and metadata. |
|
| SQLite path for the mirror. |
|
| UTC cron expression for in-process daily refresh (HTTP mode only). | unset |
| Fall through to live API on local ID-lookup miss. |
|
| Route |
|
| arXiv OAI-PMH endpoint base URL. |
|
| Minimum delay between OAI-PMH requests (ms). |
|
| Transport: |
|
| Port for HTTP server. |
|
| Auth mode: |
|
| Log level (RFC 5424). |
|
Running the Server
Local Development
Build and run:
bun run build bun run start:http # or start:stdioRun checks and tests:
bun run devcheck # Lint, format, typecheck, audit bun run test # Vitest
Optional: Local Mirror
For self-hosted deployments behind a single egress IP, arXiv's ~3-second per-IP crawl delay serializes concurrent users. An optional local mirror eliminates rate-limit exposure for arxiv_search and arxiv_get_metadata by serving from a SQLite + FTS5 store harvested via OAI-PMH. arxiv_read_paper continues to use the live API — full-content harvest is forbidden by arXiv's data policy.
Disabled by default. To enable:
# 1. Cold-start harvest (~4.4h sequential, resumable from checkpoint). One-time per installation.
bun run mirror:init
# 2. Enable the mirror.
export ARXIV_MIRROR_ENABLED=true
# 3. Start the server — reads switch to the mirror once the harvest completes.
bun run start:httpDaily incremental refresh (~30s, <1 page) via:
bun run mirror:refresh # wire to cron / systemd timer / launchd, OR
# set ARXIV_MIRROR_REFRESH_CRON in HTTP mode to schedule in-process
bun run mirror:verify # PRAGMA integrity_check + quick_checkBehavior notes. Ranking divergence: FTS5 BM25 differs from arXiv's internal ranking, so sortBy=relevance against the mirror returns a different top-K than the live API. Queries sorted by submitted descending within ARXIV_MIRROR_RECENT_DAYS_LIVE days route to the live API to cover the nightly-update gap. The mirror stores the latest version only; per-version reads continue to use the live API. See #12 for the full design.
Docker
docker build -t arxiv-mcp-server .
docker run -p 3010:3010 arxiv-mcp-serverProject Structure
Directory | Purpose |
| Tool definitions ( |
| Resource definitions ( |
|
|
| Optional OAI-PMH mirror — harvester, SQLite + FTS5 store, query translator, runner. |
| Environment variable parsing and validation with Zod. |
| Mirror lifecycle scripts ( |
| Unit and integration tests. |
| Design document and directory structure. |
Development Guide
See CLAUDE.md for development guidelines and architectural rules. The short version:
Handlers throw, framework catches — no
try/catchin tool logicUse
ctx.logfor domain-specific loggingRate limiting is managed by
ArxivService— don't add per-tool delaysarXiv API returns HTTP 200 for everything — check content-type and response body
Contributing
Issues and pull requests are welcome. Run checks before submitting:
bun run devcheck
bun testLicense
Apache-2.0 — see LICENSE for details.
This server cannot be installed
Maintenance
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/cyanheads/arxiv-mcp-server'
If you have feedback or need assistance with the MCP directory API, please join our Discord server