arxiv-mcp-server
Server Details
Search arXiv, fetch paper metadata, and read full-text content.
- Status
- Healthy
- Last Tested
- Transport
- Streamable HTTP
- URL
- Repository
- cyanheads/arxiv-mcp-server
- GitHub Stars
- 1
- Server Listing
- arxiv-mcp-server
Glama MCP Gateway
Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.
Full call logging
Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.
Tool access control
Enable or disable individual tools per connector, so you decide what your agents can and cannot do.
Managed credentials
Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.
Usage analytics
See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.
Tool Definition Quality
Average 4.3/5 across 4 of 4 tools scored.
Each tool serves a distinct purpose: getting metadata by ID, listing categories, reading full text, and searching. No overlap in functionality.
All tools follow the 'arxiv_verb_noun' pattern consistently (get_metadata, list_categories, read_paper, search).
Four tools is appropriate for an arXiv interface: search, metadata retrieval, full-text reading, and category discovery. No unnecessary bloat.
Covers the core arXiv operations: searching, metadata retrieval, full-text access, and category listing. No obvious missing functionality for a read-only research tool.
Available Tools
4 toolsarxiv_get_metadataArxiv Get MetadataARead-onlyInspect
Get full metadata for one or more arXiv papers by ID. Use when you have known IDs from citations, prior search results, or memory.
| Name | Required | Description | Default |
|---|---|---|---|
| paper_ids | Yes | arXiv paper ID or array of up to 10 IDs. Format: "2401.12345" or "2401.12345v2" (with version). Also accepts legacy IDs like "hep-th/9901001". |
Output Schema
| Name | Required | Description |
|---|---|---|
| papers | Yes | Papers found. May be fewer than requested if some IDs are invalid. |
| not_found | No | Per-input explanations for inputs that could not be returned. Absent when nothing failed. |
| totalSucceeded | Yes | Number of successful items in 'papers' |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true, so no contradiction. Description adds 'full metadata' but is otherwise consistent; no need for further behavioral details.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences, no wasted words, key information front-loaded: 'Get full metadata... by ID.' Efficient.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given an output schema exists and annotations cover safety, the description provides purpose, usage context, and parameter meaning sufficiently for a simple lookup tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% describing the paper_ids parameter with examples and format. Description adds no new parameter details beyond usage context.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it retrieves full metadata for arXiv papers by ID, distinguishing it from sibling tools like arxiv_search (no ID required) and arxiv_read_paper (reads paper content).
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly specifies when to use: 'when you have known IDs from citations, prior search results, or memory', implying alternatives for unknown IDs.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
arxiv_list_categoriesArxiv List CategoriesARead-onlyInspect
List arXiv category codes and names. Useful for discovering valid category filters for arxiv_search.
| Name | Required | Description | Default |
|---|---|---|---|
| group | No | Filter by top-level group (e.g., "cs", "math", "physics"). Returns all categories if omitted. |
Output Schema
| Name | Required | Description |
|---|---|---|
| notice | No | Guidance when the group filter returns no categories. |
| categories | Yes | arXiv categories matching the filter. |
| totalCount | Yes | Total number of categories returned. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true, and the description adds no additional behavioral traits beyond the fact that it lists categories. The description does not mention any edge cases or further behavioral characteristics.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences, front-loaded with the main purpose followed by usage guidance, no wasted words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's simplicity, the description covers the function and use case adequately. An output schema exists, so explaining return values is not necessary.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with parameter fully described in schema. The description adds no extra meaning beyond what the schema provides, meeting the baseline.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description states the specific verb 'List' and the resource 'arXiv category codes and names', clearly distinguishing it from sibling tools like arxiv_search or arxiv_read_paper.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly states that the tool is 'Usful for discovering valid category filters for arxiv_search', providing clear context and a direct use case.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
arxiv_read_paperArxiv Read PaperARead-onlyInspect
Fetch the full text of an arXiv paper as HTML. Tries arxiv.org/html first; falls back to ar5iv.labs.arxiv.org when the native render is unavailable. PDF-only papers (no HTML render on either source) return an html_unavailable error with the pdf_url for direct download. Page through long papers with the start and max_characters parameters.
| Name | Required | Description | Default |
|---|---|---|---|
| start | No | Character offset into the cleaned body to begin reading from. Defaults to 0. Use with max_characters to page through long papers — e.g., start=100000 with max_characters=100000 returns chars 100,000–199,999. The total length is reported as body_characters in the response. | |
| paper_id | Yes | arXiv paper ID (e.g., "2401.12345" or "2401.12345v2"). | |
| max_characters | No | Maximum characters of paper body content to return. Defaults to 100,000. HTML head/boilerplate is stripped before counting. When truncated, a notice and total character count are included. |
Output Schema
| Name | Required | Description |
|---|---|---|
| start | Yes | Character offset of the first character in content within the cleaned body. |
| title | Yes | Paper title (from metadata, not parsed from HTML). |
| source | Yes | Which HTML source the content was fetched from. |
| content | Yes | Cleaned paper body HTML for the requested slice. Empty when start is past body_characters. |
| pdf_url | Yes | Direct PDF download URL. |
| paper_id | Yes | arXiv paper ID. |
| truncated | Yes | True when more body content exists past this slice (start + content.length < body_characters). |
| abstract_url | Yes | arXiv abstract page URL for attribution. |
| body_characters | Yes | Character count of the full cleaned body HTML. Use with start and max_characters to page. Typically 3-4× smaller than total_characters for math-heavy papers. |
| total_characters | Yes | Character count of the original unprocessed HTML body. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Beyond the readOnlyHint annotation, the description elaborates on fallback mechanism, error handling for PDF-only papers (returns pdf_url), paging via start and max_characters, and stripping of boilerplate. This provides rich behavioral context.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three well-structured sentences: main purpose, fallback/error, paging. Every sentence adds value, no fluff. Front-loaded with core action.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
The description covers all necessary aspects: what it does, fallback, error case, paging, and output schema exists for return values. An agent can correctly select and invoke this tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so baseline is 3. The description adds paging guidance beyond schema, but the schema already explains each parameter. No significant extra meaning added.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states 'Fetch the full text of an arXiv paper as HTML,' using a specific verb and resource. The tool's purpose is distinct from siblings (metadata, categories, search), so it differentiates effectively.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explains when to use the tool (to fetch full text), details fallback behavior (arxiv.org/html then ar5iv), and describes paging, but does not explicitly compare with siblings or state when not to use. Still clear and helpful.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
arxiv_searchArxiv SearchARead-onlyInspect
Search arXiv papers by query with category and sort filters. Returns paper metadata including title, authors, abstract, categories, and links.
| Name | Required | Description | Default |
|---|---|---|---|
| query | Yes | Search query. Field prefixes: ti: (title), au: (author — token-based; quote multi-token names like au:"hinton g" or pair with a topical clause to disambiguate common surnames), abs: (abstract), cat: (category — exact code match, not fuzzy), co: (comment), jr: (journal ref), all: (all fields). Boolean operators: AND, OR, ANDNOT. Examples: "au:bengio AND ti:attention", "all:transformer AND cat:cs.CL". | |
| start | No | Pagination offset (0-10000). Use with max_results to page through results. E.g., start=10 with max_results=10 returns results 11-20. | |
| sort_by | No | Sort criterion. Use "submitted" for newest papers, "relevance" for best query matches. | relevance |
| category | No | Filter results to a specific arXiv category (e.g., "cs.CL", "math.AG"). Use arxiv_list_categories to discover valid codes. | |
| sort_order | No | Sort direction. "descending" returns newest/most relevant first. | descending |
| max_results | No | Maximum results to return (1-50). Default 10. Each result includes title, authors, abstract, and metadata — keep low to limit response size. |
Output Schema
| Name | Required | Description |
|---|---|---|
| notice | No | Recovery guidance when results are empty or paging overshot. Absent on successful pages. |
| papers | Yes | Matching papers with full metadata. |
| pageStart | Yes | Pagination offset of this result page. |
| totalFound | Yes | Total matching papers reported by arXiv (before pagination). |
| effectiveQuery | Yes | The query as sent to arXiv after input normalization. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true, and the description aligns with a read-only search. No additional behavioral context (e.g., rate limits, API specifics) is provided beyond schema constraints, but it doesn't contradict annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single sentence that is concise, front-loaded with the action and resource, and contains no superfluous words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the rich schema with 100% parameter coverage and output schema, the description provides sufficient context for the search functionality. It could mention pagination behavior but is generally complete.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with detailed descriptions for each parameter. The tool description adds no extra parameter meaning beyond what the schema already provides.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb 'search', the resource 'arXiv papers', and the filtering capabilities (query, category, sort). It distinguishes from sibling tools like arxiv_get_metadata and arxiv_list_categories by focusing on query-based search.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage for searching papers but does not explicitly provide when-not-to-use or contrast with siblings. However, sibling names make the differentiation clear.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:
{
"$schema": "https://glama.ai/mcp/schemas/connector.json",
"maintainers": [{ "email": "your-email@example.com" }]
}The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.
Control your server's listing on Glama, including description and metadata
Access analytics and receive server usage reports
Get monitoring and health status updates for your server
Feature your server to boost visibility and reach more users
For users:
Full audit trail – every tool call is logged with inputs and outputs for compliance and debugging
Granular tool control – enable or disable individual tools per connector to limit what your AI agents can do
Centralized credential management – store and rotate API keys and OAuth tokens in one place
Change alerts – get notified when a connector changes its schema, adds or removes tools, or updates tool definitions, so nothing breaks silently
For server owners:
Proven adoption – public usage metrics on your listing show real-world traction and build trust with prospective users
Tool-level analytics – see which tools are being used most, helping you prioritize development and documentation
Direct user feedback – users can report issues and suggest improvements through the listing, giving you a channel you would not have otherwise
The connector status is unhealthy when Glama is unable to successfully connect to the server. This can happen for several reasons:
The server is experiencing an outage
The URL of the server is wrong
Credentials required to access the server are missing or invalid
If you are the owner of this MCP connector and would like to make modifications to the listing, including providing test credentials for accessing the server, please contact support@glama.ai.
Discussions
No comments yet. Be the first to start the discussion!
Your Connectors
Sign in to create a connector for this server.