Skip to main content
Glama
jina-ai

Jina AI Remote MCP Server

Official
by jina-ai

search_arxiv

Search academic papers and preprints on arXiv to find research papers, scientific studies, and technical literature across fields like AI, physics, and mathematics.

Instructions

Search academic papers and preprints on arXiv repository. Perfect for finding research papers, scientific studies, technical papers, and academic literature. Use this when researching scientific topics, looking for papers by specific authors, or finding the latest research in fields like AI, physics, mathematics, computer science, etc.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
queryYesAcademic search terms, author names, or research topics (e.g., 'transformer neural networks', 'Einstein relativity', 'machine learning optimization'). Can be a single query string or an array of queries for parallel search.
numNoMaximum number of academic papers to return, between 1-100
tbsNoTime-based search parameter, e.g., 'qdr:h' for past hour, can be qdr:h, qdr:d, qdr:w, qdr:m, qdr:y

Implementation Reference

  • MCP tool handler for 'search_arxiv': registers the tool with Zod schema for input validation, handles single or multiple queries (parallel execution), authenticates with bearer token, and delegates to executeArxivSearch utility for actual API call.
    if (isToolEnabled("search_arxiv")) {
    	server.tool(
    		"search_arxiv",
    		"Search academic papers and preprints on arXiv repository. Perfect for finding research papers, scientific studies, technical papers, and academic literature. Use this when researching scientific topics, looking for papers by specific authors, or finding the latest research in fields like AI, physics, mathematics, computer science, etc.",
    		{
    			query: z.union([z.string(), z.array(z.string())]).describe("Academic search terms, author names, or research topics (e.g., 'transformer neural networks', 'Einstein relativity', 'machine learning optimization'). Can be a single query string or an array of queries for parallel search."),
    			num: z.number().default(30).describe("Maximum number of academic papers to return, between 1-100"),
    			tbs: z.string().optional().describe("Time-based search parameter, e.g., 'qdr:h' for past hour, can be qdr:h, qdr:d, qdr:w, qdr:m, qdr:y")
    		},
    		async ({ query, num, tbs }: { query: string | string[]; num: number; tbs?: string }) => {
    			try {
    				const props = getProps();
    
    				const tokenError = checkBearerToken(props.bearerToken);
    				if (tokenError) {
    					return tokenError;
    				}
    
    				// Handle single query or single-element array
    				if (typeof query === 'string' || (Array.isArray(query) && query.length === 1)) {
    					const singleQuery = typeof query === 'string' ? query : query[0];
    					const searchResult = await executeArxivSearch({ query: singleQuery, num, tbs }, props.bearerToken);
    
    					return {
    						content: formatSingleSearchResultToContentItems(searchResult),
    					};
    				}
    
    				// Handle multiple queries with parallel search
    				if (Array.isArray(query) && query.length > 1) {
    					const searches = query.map(q => ({ query: q, num, tbs }));
    
    					const uniqueSearches = searches.filter((search, index, self) =>
    						index === self.findIndex(s => s.query === search.query)
    					);
    
    					const arxivSearchFunction = async (searchArgs: SearchArxivArgs) => {
    						return executeArxivSearch(searchArgs, props.bearerToken);
    					};
    
    					const results = await executeParallelSearches(uniqueSearches, arxivSearchFunction, { timeout: 30000 });
    
    					return {
    						content: formatParallelSearchResultsToContentItems(results),
    					};
    				}
    
    				return createErrorResponse("Invalid query format");
    			} catch (error) {
    				return createErrorResponse(`Error: ${error instanceof Error ? error.message : String(error)}`);
    			}
    		},
    	);
    }
  • Core helper function implementing the arXiv search logic: makes HTTP POST to Jina AI search API (svip.jina.ai) with domain='arxiv', query, num results, and optional tbs filter, handles errors, returns results or error.
    export async function executeArxivSearch(
        searchArgs: SearchArxivArgs,
        bearerToken: string
    ): Promise<SearchResultOrError> {
        try {
            const response = await fetch('https://svip.jina.ai/', {
                method: 'POST',
                headers: {
                    'Accept': 'application/json',
                    'Content-Type': 'application/json',
                    'Authorization': `Bearer ${bearerToken}`,
                },
                body: JSON.stringify({
                    q: searchArgs.query,
                    domain: 'arxiv',
                    num: searchArgs.num || 30,
                    ...(searchArgs.tbs && { tbs: searchArgs.tbs })
                }),
            });
    
            if (!response.ok) {
                return { error: `arXiv search failed for query "${searchArgs.query}": ${response.statusText}` };
            }
    
            const data = await response.json() as any;
            return { query: searchArgs.query, results: data.results || [] };
        } catch (error) {
            return { error: `arXiv search failed for query "${searchArgs.query}": ${error instanceof Error ? error.message : String(error)}` };
        }
    }
  • TypeScript interface defining the input schema for arXiv search arguments, used by the handler and helpers.
    export interface SearchArxivArgs {
        query: string;
        num?: number;
        tbs?: string;
    }
  • src/index.ts:99-102 (registration)
    Top-level registration call to registerJinaTools which includes the search_arxiv tool among others, with optional enabledTools filter.
    // Register all Jina AI tools with optional filtering
    registerJinaTools(server, () => currentProps, enabledTools);
    
    return server;
  • Interface used across files for type safety in search_arxiv arguments.
    export interface SearchArxivArgs {
        query: string;
        num?: number;
        tbs?: string;
    }
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It mentions the tool is 'perfect for finding research papers' and lists use cases, but doesn't disclose behavioral traits like rate limits, authentication needs, pagination behavior, error handling, or what the return format looks like. For a search tool with no annotation coverage, this leaves significant gaps in understanding how it behaves.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized with two sentences: first states the purpose, second provides usage guidelines. It's front-loaded with the core function and avoids unnecessary repetition. Every sentence adds value, though it could be slightly more concise by integrating the examples more tightly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations and no output schema, the description provides good purpose and usage context but lacks behavioral details (e.g., return format, error cases). For a search tool with 3 parameters and 100% schema coverage, it's adequate but has clear gaps in transparency that reduce completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters thoroughly. The description doesn't add any parameter-specific information beyond what's in the schema (e.g., it doesn't explain query syntax further or provide examples of tbs values). Baseline 3 is appropriate when the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool searches academic papers and preprints on arXiv repository, specifying the resource (arXiv repository) and verb (search). It distinguishes from siblings like search_web, search_images, and search_ssrn by focusing specifically on academic/scientific content, making the purpose specific and differentiated.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use it: 'when researching scientific topics, looking for papers by specific authors, or finding the latest research in fields like AI, physics, mathematics, computer science, etc.' It doesn't explicitly state when NOT to use it or name alternatives (e.g., parallel_search_arxiv), but the context is sufficiently detailed for informed usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jina-ai/MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server