The ArXiv MCP Server enables you to search, download, and analyze academic papers from arXiv within research workflows and multi-agent systems.
Search Papers (
search_papers): Query arXiv with advanced syntax including quoted phrases, boolean operators (AND,OR,ANDNOT), field-specific searches (ti:,au:,abs:), category filtering (e.g.,cs.AI,cs.LG,cs.CL), date ranges, and sort by relevance or date — returning up to 50 results.Download Papers (
download_paper): Fetch a paper PDF by arXiv ID and convert it to clean markdown format optimized for LLM consumption; supports checking conversion status without re-downloading.List Papers (
list_papers): View all papers previously downloaded and stored locally.Read Papers (
read_paper): Access the full markdown content of a downloaded paper by its arXiv ID.Deep Paper Analysis (prompt): Use the built-in
deep-paper-analysisprompt for structured examination covering executive summary, methodology, results, implications, and future directions.Multi-agent Integration: Function as a specialist research agent within frameworks like Microsoft Magentic-UI or AutoGen, supporting a multi-stage pipeline of planning → discovery → download → analysis → synthesis.
Observability: All tool calls are instrumented with OpenTelemetry tracing for monitoring and debugging.
Enables searching arXiv's research paper repository, downloading papers by ID, accessing paper content, and managing locally stored papers with support for filters like date ranges and categories.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@ArXiv MCP Serversearch for recent papers on quantum machine learning"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
arXiv Deep Research
A Model Context Protocol (MCP) server for searching, downloading, and reading arXiv papers — designed as a specialist agent for integration into multi-agent systems like Microsoft Magentic-UI and AutoGen.
The idea: Rather than treating arXiv search as a simple lookup tool, this server is structured as a first-class research agent — one you can plug directly into a Magentic-One-style team as an
McpAgent, giving an Orchestrator access to the full scientific literature as a delegatable resource.
Integration with Magentic-UI
Magentic-UI supports custom McpAgent instances via mcp_agent_configs in its config file. This server plugs in directly:
# examples/magentic_ui_config.yaml
client:
mcp_agent_configs:
- agent_name: ArxivResearcher
description: >
Specialist agent for searching and reading arXiv papers.
Use when the task requires finding academic papers, understanding
research literature, or retrieving technical details from published work.
server_params:
type: StdioServerParams
command: python
args: ["-m", "arxiv_mcp_server"]
env:
PYTHONPATH: /path/to/arxiv-deep-research/srcOnce registered, the Magentic-UI Orchestrator can delegate research subtasks to this agent through the standard Task Ledger / Progress Ledger pattern — exactly how WebSurfer handles web browsing, but for academic literature.
Integration with AutoGen AgentChat
See examples/autogen_research_team.py for a complete 3-agent team:
Orchestrator (MagenticOneGroupChat)
├── ArxivSurfer ← this MCP server, wrapped via StdioServerParams + mcp_server_tools
└── Coder ← synthesizes findings into structured markdown reportspip install "autogen-agentchat" "autogen-ext[openai]" "mcp>=1.2.0"
export OPENAI_API_KEY=...
python examples/autogen_research_team.pyTools
Tool | Description |
| Query arXiv with advanced filters: date range, category, sort by relevance or date |
| Fetch a paper PDF and convert to clean markdown for LLM consumption |
| Access previously downloaded paper content |
| View all papers in local storage |
search_papers
Supports rich query syntax — quoted phrases, boolean operators, field-specific search (ti:, au:, abs:), and category filtering:
{
"query": "\"multi-agent\" AND \"orchestration\" ANDNOT survey",
"max_results": 10,
"date_from": "2024-01-01",
"categories": ["cs.AI", "cs.MA"],
"sort_by": "relevance"
}Multi‑stage research pipeline
At a high level, arxiv-deep-research runs a simple but powerful multi‑stage loop:
Plan the research task
A coordinator agent (for example the AutoGen
MagenticOneGroupChatOrchestrator) takes the user goal and breaks it into sub‑tasks.
Discover candidate papers
The coordinator calls the MCP
search_paperstool to find relevant arXiv papers by topic, category, and date.
Download and normalize content
For selected IDs, it calls
download_paper, which fetches the PDF and converts it into clean markdown for LLMs to read.
Deep paper analysis
The coordinator (or another agent) uses the
deep-paper-analysisprompt to ask for a structured analysis of a given paper ID, optionally across multiple calls as you explore related work.
Synthesis and reporting
A downstream agent such as
Coder(in the AutoGen example) turns these analyses into a final research report: summaries, comparison tables, open problems, and next‑step suggestions.
You can run this pipeline manually by calling the tools and prompts from any MCP‑aware client, or automatically using the sample AutoGen team.
Evaluation Benchmark
The repo includes a retrieval quality benchmark (eval/benchmark.py) measuring:
Precision@K — fraction of top-K results that are relevant
Recall@K — fraction of known relevant papers found in top-K
MRR — Mean Reciprocal Rank of first relevant result
Ground-truth queries are seeded from landmark papers (AutoGen 2308.08155, Magentic-One 2411.04468, RAG 2005.11401, CoT 2201.11903) and can be extended automatically using the synthetic data pipeline below.
python eval/benchmark.py --k 10 --output results.jsonSynthetic Eval Data Generation (AgentInstruct-style)
scripts/generate_eval_tasks.py implements a 4-stage pipeline that generates diverse benchmark queries from arXiv abstracts — mirroring the AgentInstruct approach:
Stage 1: Seed collection → fetch paper abstracts from arXiv by category
Stage 2: Content transform → extract key concepts and problem statements
Stage 3: Instruction gen → generate realistic research queries via GPT-4o-mini
Stage 4: Instruction refine → create harder variants at subtopic intersectionsexport OPENAI_API_KEY=...
python scripts/generate_eval_tasks.py --seed-category cs.AI --num-seeds 20 --output eval/generated_queries.jsonOutput includes easy/medium/hard difficulty tiers for stratified evaluation.
Observability: OpenTelemetry Tracing
Every tool call is instrumented with OpenTelemetry spans (mirrors AutoGen v0.4's built-in OTel support):
# Console output (no infrastructure needed)
export ARXIV_MCP_TRACE_CONSOLE=true
python -m arxiv_mcp_server
# OTLP export to Jaeger / Azure Monitor
docker run -d --name jaeger -p 16686:16686 -p 4317:4317 jaegertracing/all-in-one
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
export OTEL_SERVICE_NAME=arxiv-mcp-server
python -m arxiv_mcp_server
# View traces: http://localhost:16686Spans recorded: mcp.tool.search_papers, mcp.tool.download_paper, mcp.tool.read_paper — each with query, categories, result count, latency, and error status as attributes.
Tracing is a zero-cost no-op when opentelemetry-sdk is not installed.
Installation
Requires Python 3.11+
git clone https://github.com/freyzo/arxiv-deep-research
cd arxiv-deep-research
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
# Optional: OTel tracing
pip install -e ".[tracing]"Claude Desktop
{
"mcpServers": {
"arxiv": {
"command": "/path/to/.venv/bin/python",
"args": ["-m", "arxiv_mcp_server", "--storage-path", "/path/to/papers"]
}
}
}Cursor
{
"mcpServers": {
"arxiv": {
"command": "python",
"args": ["-m", "arxiv_mcp_server"],
"env": { "PYTHONPATH": "/path/to/arxiv-deep-research/src" }
}
}
}Prompts
deep-paper-analysis
Comprehensive analysis workflow covering executive summary, methodology, results, implications, and future directions:
{ "paper_id": "2401.12345" }Running and resuming research sessions
There are two main ways to run research sessions today.
1. AutoGen multi‑agent team (recommended demo)
This uses OpenAI models to coordinate a full research workflow.
cd arxiv-deep-research
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
pip install "autogen-agentchat" "autogen-ext[openai]" "mcp>=1.2.0"
export OPENAI_API_KEY=your_openai_key
python examples/autogen_research_team.pyThis starts an interactive console UI where:
the Orchestrator plans the work,
ArxivSurfer searches and downloads papers via MCP, and
Coder writes the final markdown report.
To resume a session, you can:
run the script again and paste the previous summary as part of a new task, or
keep the same console session open and give the team a follow‑up instruction (for example, “Now focus on safety trade‑offs”).
2. Direct MCP usage from tools like Claude Desktop or Cursor
You can also talk to the MCP server directly and build your own loop:
cd arxiv-deep-research
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
export ARXIV_MCP_TRACE_CONSOLE=true # optional
python -m arxiv_mcp_serverWhile this server runs, any MCP‑aware client can:
call
search_papersanddownload_paper,use
read_paperto pull content into the chat, andcall the
deep-paper-analysisprompt multiple times.
The prompt handler keeps a simple global research context, so repeated calls in the same process will mention previously analyzed paper IDs and encourage the model to connect them. In practice, “resuming” a research session means:
keeping the same MCP server process alive, and
issuing new
deep-paper-analysiscalls for new paper IDs from the same client or workspace.
Repository Structure
arxiv-deep-research/
├── src/arxiv_mcp_server/
│ ├── server.py # MCP server + OTel init
│ ├── tracing.py # @trace_tool decorator, OTLP + console exporters
│ ├── config.py
│ ├── tools/ # search, download, read, list
│ └── prompts/ # deep research analysis prompt
├── examples/
│ ├── autogen_research_team.py # Magentic-One-style 3-agent team
│ └── magentic_ui_config.yaml # McpAgent config for Magentic-UI
├── eval/
│ └── benchmark.py # Precision@K / Recall@K / MRR harness
├── scripts/
│ └── generate_eval_tasks.py # AgentInstruct-style query generator
└── pyproject.tomlEnvironment Variables
Variable | Default | Description |
|
| Paper storage location |
|
| Enable console trace output |
| — | OTLP endpoint (e.g. |
|
| Service name in traces |
If you use the optional eval data generator, you also need:
Variable | Description |
| Used by |
Known issues
Model support is OpenAI‑only today.
The AutoGen research team and the synthetic eval generator both call OpenAI models (
gpt-4o/gpt-4o-mini) via the OpenAI Python SDK.There is no first‑class , even though the design would support it.
No MCP Resources yet.
Papers are exposed only via tools (
read_paper) rather than as MCP Resources with stablearxiv://URIs. MCP clients that prefer Resources cannot list papers yet.
Limited testing.
The core retrieval and eval logic has very light automated testing; metric functions and tool handlers should gain unit tests over time.
Roadmap
Planned improvements (subject to change):
Gemini / Gemma support via
Add an optional
google-genaidependency and a small runner that can call Gemini/Gemma models usingGEMINI_API_KEY.Expose this as an alternative backend for the research team demo and the eval generator.
MCP Resources for downloaded papers
Implement
list_resources/read_resourceso downloaded PDFs appear asarxiv://paper_idresources in MCP clients.
Stronger testing and evals
Add unit tests for metrics, search helpers, and prompt handlers.
Automate running
eval/benchmark.pyand track regression over time.
Richer research sessions
Replace the simple global research context with explicit session IDs and persisted state, so “resume session X” becomes a first‑class feature across restarts.
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.