arxiv-mcp
Provides tools for searching arXiv papers, fetching metadata, reading papers as Markdown, exporting BibTeX, and downloading PDFs.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@arxiv-mcpsearch for papers on quantum computing"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
arxiv-toolkit
A TypeScript library exposed as a CLI (arxiv) and an MCP server (arxiv-mcp) for
searching arXiv, fetching metadata, and reading papers as clean, section-aware Markdown
(HTML → ar5iv → PDF fallback). API-first over arXiv's official endpoints, with a lazy
browser fallback (off by default).
Search & discovery — full-text and field-scoped search (title, author, abstract, category), boolean queries, sorting, pagination, and a "recent in a category" listing.
Read full text — section-aware Markdown (or plain text), chunkable via
maxChars/cursorso an LLM can read large papers within a context budget.Metadata & export — rich metadata for one or many IDs and BibTeX export (canonical arXiv endpoint, with an offline
@miscgenerator fallback).Polite & portable — per-host rate limiting, retry/backoff, aggressive caching, OS-native paths. No browser required.
Install
Global
npm install -g arxiv-toolkitAfter global install, both bins are on PATH:
arxiv search "transformer attention"
arxiv-mcp # starts the stdio MCP servernpx (no global install)
Gotcha: the bin names (
arxiv,arxiv-mcp) differ from the package name (arxiv-toolkit).npx arxiv-toolkit ...does not resolve to the bins — use--package:
npx -y --package arxiv-toolkit arxiv search "transformer attention"
npx -y --package arxiv-toolkit arxiv read 2310.06825
npx -y --package arxiv-toolkit arxiv-mcpRelated MCP server: arXiv MCP Server
CLI usage
arxiv <command> [options]
Commands:
search [query] Search arXiv (query optional if a field flag is given).
get <id...> Fetch metadata for one or more IDs.
read <id> Read a paper as Markdown/text.
download <id...> Save PDF(s) to disk.
recent <category> Latest papers in a category.
cache <clear|path> Cache maintenance.
Global options:
--json JSON output (scripting)
--no-cache Bypass cache
--cache-dir <dir> Override cache directory
--browser Enable browser fallback (off by default)
--quiet Suppress hints/non-fatal warnings
--verbose Print stack traces on errorsearch
arxiv search "diffusion models" --author "ho" --category cs.LG --sort submitted --max 20 --jsonFlags: --author --category --title --abstract --sort relevance|submitted|updated --order asc|desc --max <n> --start <n> --json. For large result sets (>1000), a narrowing hint is printed to stderr (suppressed by --quiet).
get (metadata + BibTeX)
arxiv get 2310.06825 cond-mat/0011267
arxiv get 2310.06825 --bibtex --jsonget accepts multiple IDs; the metadata is batched (≤50 IDs per request) and returned in input order. --bibtex emits canonical BibTeX from arXiv's https://arxiv.org/bibtex/{id} endpoint, falling back to a generated @misc entry offline.
read (full text)
arxiv read 2310.06825
arxiv read 2310.06825 --format text --section "Method"
arxiv read 2310.06825 --source pdf --max-chars 12000 --out paper.mdFlags: --source auto|html|pdf (default auto: native HTML → ar5iv → PDF), --format markdown|text (default markdown), --section <name> (return one section by S1-style id or title substring), --max-chars <n> (soft chunk target; snaps to whole-section boundaries), --out <file>. Use --max-chars to read a paper section-by-section; the nextCursor field in --json output is the authoritative "more remains" signal.
download
arxiv download 2310.06825 cond-mat/0011267 --out ./papersdownload <id...> saves each PDF (old-style IDs are sanitized on disk: cond-mat/0011267 → cond-mat_0011267.pdf). The absolute saved path is printed per ID; processing continues on error and the process exits non-zero if any ID failed.
recent
arxiv recent cs.CL --max 10 --jsoncache
arxiv cache clear # empty the cache
arxiv cache path # print the cache directoryMCP server
arxiv-mcp is a Model Context Protocol stdio server exposing the same core as five tools: arxiv_search, arxiv_get_metadata, arxiv_read_paper, arxiv_list_recent, arxiv_download.
Claude Code
Register the server for your user scope:
claude mcp add arxiv --scope user -- npx -y --package arxiv-toolkit arxiv-mcpOptions go before the name and -- goes before the command. The registered server name arxiv and the bin arxiv-mcp are intentionally distinct (logical name vs. launcher). Verify with claude mcp list.
Config-file forms
Equivalent static config for .mcp.json (Claude Code) or claude_desktop_config.json (Claude Desktop):
{
"mcpServers": {
"arxiv": {
"command": "npx",
"args": ["-y", "--package", "arxiv-toolkit", "arxiv-mcp"]
}
}
}With a global install, use the bin directly:
{
"mcpServers": {
"arxiv": {
"command": "arxiv-mcp"
}
}
}Tools
Tool | Purpose |
| Search arXiv; returns |
| Metadata for one or more IDs; optional BibTeX. |
| Section-aware Markdown/text with |
| Recent papers in a category. |
| Save a PDF; returns the absolute path + a |
Browser fallback (off by default)
The API-first path (official arXiv endpoints) is the default and needs no browser. An
optional browser fallback (playwright-core, an optionalDependency, lazy-loaded) can
retry the same URLs when the API path fails for a non-content reason (e.g. a
challenge/403, or repeated 5xx/connection/TLS failure after retries are exhausted).
It is not triggered by a clean 404 (a legitimate "not available here" → the source
matrix continues to the next source).
Enable it with:
the
--browserCLI flag,the
ARXIV_BROWSER=1environment variable, or"browserFallback": truein the config file.
If no browser binary is installed when the fallback is engaged, arxiv-toolkit raises a
clear UnsupportedError with install guidance and leaves the API path unaffected — it
never breaks the default flow. Cache maintenance is CLI/ops-only; there is no MCP cache
tool.
Configuration
Configuration is resolved with precedence: CLI flag → environment variable → config file → default. The config file is <configDir>/config.json (a Partial<ArxivConfig> JSON object; unknown keys are ignored).
Env var | Field | Notes |
|
| Cache directory. |
|
| Default |
|
| Per-host min-interval (default 3000). |
|
| Default page size (default 25; the 2000 clamp is fixed). |
|
|
|
|
|
|
|
| Email used in the User-Agent. |
|
| Overrides the entire UA string. |
Paths are cross-platform via env-paths. A descriptive User-Agent with a contact email
is sent on every request; please set ARXIV_CONTACT to your email so arXiv can reach you
if your usage causes problems.
Bulk access (out of scope)
This toolkit is for targeted search and reading, not bulk harvesting. For large-scale access use arXiv's official bulk channels:
OAI-PMH —
https://oaipmh.arxiv.org/oaiAWS S3 (requester-pays) —
s3://arxiv(pdf/,src/+ manifests). See arXiv S3 bulk data.Kaggle — Cornell University/arxiv dump.
See arXiv bulk data for guidance and etiquette.
License
MIT. See LICENSE.
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/aliildan/arxiv-toolkit'
If you have feedback or need assistance with the MCP directory API, please join our Discord server