ArXiv MCP Server
Server Quality Checklist
Latest release: v1.0.0
- Disambiguation5/5
Each tool has a clearly distinct purpose: download_paper fetches and stores papers, list_papers shows available resources, read_paper accesses stored content, and search_papers finds papers on arXiv. There is no overlap in functionality, making tool selection straightforward for an agent.
Naming Consistency5/5All tool names follow a consistent verb_noun pattern (e.g., download_paper, list_papers, read_paper, search_papers). This uniformity enhances readability and predictability, allowing agents to easily understand and use the toolset.
Tool Count5/5With 4 tools, the server is well-scoped for its arXiv paper management purpose. Each tool serves a specific and necessary function (search, download, list, read), providing a complete workflow without being overly complex or sparse.
Completeness5/5The toolset offers complete coverage for the arXiv paper domain: search_papers finds papers, download_paper acquires them, list_papers manages resources, and read_paper accesses content. This covers the full lifecycle from discovery to reading, with no apparent gaps for agent operations.
Average 3.3/5 across 4 of 4 tools scored.
See the Tool Scores section below for per-tool breakdowns.
- 11 of 18 issues responded to in the last 6 months
- No commit activity data available
- Last stable release on
- No critical vulnerability alerts
- No high-severity vulnerability alerts
- No code scanning findings
- CI is passing
Add a LICENSE file by following GitHub's guide. Once GitHub recognizes the license, the system will automatically detect it within a few hours.
If the license does not appear after some time, you can manually trigger a new scan using the MCP server admin interface.
MCP servers without a LICENSE cannot be installed.
This repository includes a README.md file.
Tools from this server were used 10 times in the last 30 days.
Add a glama.json file to provide metadata about your server.
This server has been verified by its author.
Add related servers to improve discoverability.
How to sync the server with GitHub?
Servers are automatically synced at least once per day, but you can also sync manually at any time to instantly update the server profile.
To manually sync the server, click the "Sync Server" button in the MCP server admin interface.
How is the quality score calculated?
The overall quality score combines two components: Tool Definition Quality (70%) and Server Coherence (30%).
Tool Definition Quality measures how well each tool describes itself to AI agents. Every tool is scored 1–5 across six dimensions: Purpose Clarity (25%), Usage Guidelines (20%), Behavioral Transparency (20%), Parameter Semantics (15%), Conciseness & Structure (10%), and Contextual Completeness (10%). The server-level definition quality score is calculated as 60% mean TDQS + 40% minimum TDQS, so a single poorly described tool pulls the score down.
Server Coherence evaluates how well the tools work together as a set, scoring four dimensions equally: Disambiguation (can agents tell tools apart?), Naming Consistency, Tool Count Appropriateness, and Completeness (are there gaps in the tool surface?).
Tiers are derived from the overall score: A (≥3.5), B (≥3.0), C (≥2.0), D (≥1.0), F (<1.0). B and above is considered passing.
Tool Scores
- Behavior2/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions 'create a resource' but doesn't specify what that entails (e.g., file format, storage location, or permissions required). It also omits details like rate limits, error handling, or whether the download is destructive or idempotent, leaving significant gaps.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence that directly states the tool's purpose without unnecessary words. It is front-loaded and wastes no space, making it easy to parse quickly.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness2/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity of a download operation with no annotations and no output schema, the description is insufficient. It doesn't explain the return values, error conditions, or the nature of the created resource, leaving the agent with incomplete information to use the tool effectively in context.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters3/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the input schema already documents both parameters ('paper_id' and 'check_status') thoroughly. The description adds no additional meaning beyond what the schema provides, such as explaining the interaction between parameters or clarifying the 'resource' creation process, meeting the baseline for high schema coverage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose4/5Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('download a paper') and the outcome ('create a resource for it'), which is specific and actionable. However, it doesn't explicitly differentiate from sibling tools like 'read_paper' or 'list_papers', which might have overlapping or related functionality, preventing a perfect score.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines2/5Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives like 'read_paper' or 'search_papers'. It lacks context about prerequisites, such as needing a valid arXiv ID, or exclusions, such as when 'check_status' might be preferred over full download.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
- Behavior2/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations, the description carries the full burden of behavioral disclosure. It states the tool reads content in markdown format, but does not mention permissions, rate limits, error handling, or what 'stored paper' entails (e.g., local cache vs. remote source). This leaves significant gaps for a read operation.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence that front-loads the core action ('read the full content') and includes key details (resource, format). There is no wasted verbiage, making it highly concise and well-structured.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness2/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given no annotations and no output schema, the description is incomplete. It lacks details on return values (e.g., structure of markdown content), error cases, or behavioral traits like caching. For a tool with one parameter but undefined output, this leaves the agent with insufficient context.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters3/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, with the parameter 'paper_id' documented as 'The arXiv ID of the paper to read'. The description adds no additional meaning beyond this, such as format examples or constraints, so it meets the baseline for high schema coverage without compensating value.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose4/5Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('read the full content') and resource ('stored paper'), specifying the output format ('in markdown format'). It distinguishes from siblings like 'download_paper' (likely for file retrieval) and 'list_papers'/'search_papers' (for listing/searching), but does not explicitly differentiate them in the text.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines2/5Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance is provided on when to use this tool versus alternatives like 'download_paper' or 'search_papers'. The description implies usage for reading content, but lacks explicit context, prerequisites, or exclusions, leaving the agent to infer based on tool names alone.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
- Behavior2/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It states the tool lists papers but doesn't mention any behavioral traits such as pagination, sorting, filtering, rate limits, or what 'available as resources' entails. This leaves significant gaps for a tool that likely returns a list of items.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, clear sentence with no wasted words. It's front-loaded with the core purpose and appropriately sized for a simple list tool, making it highly efficient.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness2/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the lack of annotations and output schema, the description is incomplete. It doesn't explain what 'papers' or 'resources' mean, the format of the returned list, or any constraints (e.g., access limits). For a tool with no structured metadata, more context is needed to be fully helpful.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters4/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The tool has 0 parameters, and the input schema has 100% description coverage (though empty). The description doesn't need to add parameter semantics, so it meets the baseline of 4 for tools with no parameters, as it doesn't introduce confusion or omissions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose4/5Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('List') and resource ('all existing papers available as resources'), providing a specific verb+resource combination. However, it doesn't differentiate from sibling tools like 'search_papers' or 'read_paper', which would be needed for a score of 5.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines2/5Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives like 'search_papers' or 'download_paper'. It lacks context about usage scenarios, exclusions, or prerequisites, leaving the agent with minimal direction.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
- Behavior4/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes key behaviors: result sorting (by relevance, not just date), query optimization techniques, and the impact of parameters like categories on relevance. However, it lacks details on rate limits, error handling, or authentication needs, which are common for API tools. The description doesn't contradict any annotations (none exist), so no contradiction flag is triggered.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness3/5Is the description appropriately sized, front-loaded, and free of redundancy?
The description is front-loaded with the core purpose, but it's lengthy with multiple sections (guidelines, patterns, filtering, examples, tips). While each section adds value, it could be more concise by integrating some details (e.g., merging examples into guidelines). The structure is logical but verbose, making it less efficient for quick scanning by an AI agent compared to a tighter presentation.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (6 parameters, no output schema, no annotations), the description is largely complete: it covers purpose, usage, parameters, and behavioral traits. However, it lacks output details (e.g., result format, pagination) and doesn't address potential errors or limits, leaving some gaps. The rich parameter explanations compensate partially, but for a search tool with no output schema, more on return values would enhance completeness.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters4/5Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the baseline is 3. The description adds significant value beyond the schema by explaining parameter semantics in depth: it provides query construction guidelines with examples, clarifies date filtering formats and use cases, details category options with relevance impacts, and explains sort_by implications ('relevance' vs. 'date'). This goes well beyond the schema's basic descriptions, though it doesn't cover all parameters equally (e.g., max_results gets less attention).
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5Does the description clearly state what the tool does and how it differs from similar tools?
The description immediately states 'Search for papers on arXiv with advanced filtering and query optimization,' which clearly specifies the verb (search), resource (papers on arXiv), and scope (advanced filtering/optimization). It distinguishes from sibling tools like 'list_papers' (likely simpler listing) and 'download_paper'/'read_paper' (post-search actions), making the purpose specific and differentiated.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines5/5Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit guidance on when to use this tool vs. alternatives through detailed query construction guidelines, category filtering recommendations, and examples. It implicitly positions this as the primary search tool for arXiv papers with advanced capabilities, contrasting with simpler sibling tools like 'list_papers' that likely lack such filtering. The tips for foundational research further clarify usage contexts.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
GitHub Badge
Glama performs regular codebase and documentation scans to:
- Confirm that the MCP server is working as expected.
- Confirm that there are no obvious security issues.
- Evaluate tool definition quality.
Our badge communicates server capabilities, safety, and installation instructions.
Card Badge
Copy to your README.md:
Score Badge
Copy to your README.md:
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/blazickjp/arxiv-mcp-server'
If you have feedback or need assistance with the MCP directory API, please join our Discord server