search_arxiv
Find academic papers on arXiv by entering a search query. Customize results by specifying the maximum number of papers to retrieve. Ideal for researchers and students.
Instructions
Search arXiv for academic papers
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| maxResults | No | Maximum results to return | |
| query | Yes | Search query |
Implementation Reference
- src/tools/academic/simple.ts:14-194 (registration)Registers the 'search_arxiv' tool in the ToolRegistry, including name, description, schema, and execute handler.registry.registerTool({ name: 'search_arxiv', description: 'Search arXiv for academic papers', category: 'academic', source: 'arxiv.org', inputSchema: { type: 'object', properties: { query: { type: 'string', description: 'Search query' }, maxResults: { type: 'number', description: 'Maximum results to return' } }, required: ['query'] }, execute: async (args: ToolInput): Promise<ToolOutput> => { const query = args.query || ''; const maxResults = Math.min(args.maxResults || 10, 50); // Limit to 50 results // Declare lastError at function scope let lastError: any = null; try { const startTime = Date.now(); // Try arXiv API with enhanced retry mechanism let results = []; let apiSuccess = false; // Try multiple endpoints with different configurations const apiConfigs = [ { url: 'https://export.arxiv.org/api/query', timeout: 20000, headers: { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36', 'Accept': 'application/atom+xml' } }, { url: 'http://export.arxiv.org/api/query', timeout: 15000, headers: { 'User-Agent': 'Open-Search-MCP/1.0', 'Accept': 'application/atom+xml' } }, { url: 'https://arxiv.org/api/query', timeout: 10000, headers: { 'User-Agent': 'Open-Search-MCP/1.0', 'Accept': 'application/atom+xml' } } ]; for (const config of apiConfigs) { for (let attempt = 0; attempt < 3; attempt++) { try { const params = { search_query: `all:${encodeURIComponent(query)}`, start: 0, max_results: maxResults, sortBy: 'relevance', sortOrder: 'descending' }; const response = await axios.get(config.url, { params, timeout: config.timeout, headers: config.headers, maxRedirects: 5, validateStatus: (status) => status < 500 // Accept 4xx but retry on 5xx }); if (response.status === 200 && response.data) { // Parse XML response const xmlData = response.data; results = parseArxivXML(xmlData); if (results.length > 0) { apiSuccess = true; break; } } } catch (apiError) { lastError = apiError; // Wait before retry if (attempt < 2) { await new Promise(resolve => setTimeout(resolve, 1000 * (attempt + 1))); } } } if (apiSuccess) break; } // If API fails, try search engine as fallback if (!apiSuccess || results.length === 0) { try { console.log('arXiv API failed, trying search engine fallback...'); const searchQuery = `site:arxiv.org "${query}" filetype:pdf`; const searchEngine = await import('../../engines/search-engine-manager.js'); const searchResults = await searchEngine.SearchEngineManager.getInstance().search(searchQuery, { maxResults: maxResults * 2, timeout: 10000 }); if (searchResults && searchResults.results && searchResults.results.length > 0) { results = extractArxivResultsFromSearch(searchResults.html || '', query); console.log(`Found ${results.length} results from search engine fallback`); } } catch (searchError) { console.log('Search engine fallback also failed:', searchError); } } const searchTime = Date.now() - startTime; // If no results found, provide helpful error message if (results.length === 0) { return { success: false, error: 'No arXiv papers found for this query', data: { source: 'arXiv', query, results: [], totalResults: 0, searchTime, apiUsed: apiSuccess, suggestions: [ 'Try broader search terms', 'Check spelling of technical terms', 'Use different keywords or synonyms', 'Try searching without quotes' ], lastError: lastError ? (lastError instanceof Error ? lastError.message : String(lastError)) : null } }; } return { success: true, data: { source: apiSuccess ? 'arXiv API' : 'arXiv (Search Engine)', query, results: results.slice(0, maxResults), totalResults: results.length, searchTime, apiUsed: apiSuccess, fallbackUsed: !apiSuccess }, metadata: { totalResults: results.length, searchTime, sources: ['arxiv.org'], cached: false, apiSuccess, fallbackUsed: !apiSuccess } }; } catch (error) { return { success: false, error: `arXiv search failed: ${error instanceof Error ? error.message : String(error)}`, data: { source: 'arXiv', query, results: [], totalResults: 0, apiUsed: false, lastError: lastError ? (lastError instanceof Error ? lastError.message : String(lastError)) : null, suggestions: [ 'Check your internet connection', 'Try again in a few moments', 'Use different search terms', 'Contact support if the problem persists' ] } }; } } });
- src/tools/academic/simple.ts:27-192 (handler)Core execution logic for search_arxiv: queries arXiv API with multiple endpoints and retries, parses XML results, falls back to general search engine if API fails, handles errors gracefully.execute: async (args: ToolInput): Promise<ToolOutput> => { const query = args.query || ''; const maxResults = Math.min(args.maxResults || 10, 50); // Limit to 50 results // Declare lastError at function scope let lastError: any = null; try { const startTime = Date.now(); // Try arXiv API with enhanced retry mechanism let results = []; let apiSuccess = false; // Try multiple endpoints with different configurations const apiConfigs = [ { url: 'https://export.arxiv.org/api/query', timeout: 20000, headers: { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36', 'Accept': 'application/atom+xml' } }, { url: 'http://export.arxiv.org/api/query', timeout: 15000, headers: { 'User-Agent': 'Open-Search-MCP/1.0', 'Accept': 'application/atom+xml' } }, { url: 'https://arxiv.org/api/query', timeout: 10000, headers: { 'User-Agent': 'Open-Search-MCP/1.0', 'Accept': 'application/atom+xml' } } ]; for (const config of apiConfigs) { for (let attempt = 0; attempt < 3; attempt++) { try { const params = { search_query: `all:${encodeURIComponent(query)}`, start: 0, max_results: maxResults, sortBy: 'relevance', sortOrder: 'descending' }; const response = await axios.get(config.url, { params, timeout: config.timeout, headers: config.headers, maxRedirects: 5, validateStatus: (status) => status < 500 // Accept 4xx but retry on 5xx }); if (response.status === 200 && response.data) { // Parse XML response const xmlData = response.data; results = parseArxivXML(xmlData); if (results.length > 0) { apiSuccess = true; break; } } } catch (apiError) { lastError = apiError; // Wait before retry if (attempt < 2) { await new Promise(resolve => setTimeout(resolve, 1000 * (attempt + 1))); } } } if (apiSuccess) break; } // If API fails, try search engine as fallback if (!apiSuccess || results.length === 0) { try { console.log('arXiv API failed, trying search engine fallback...'); const searchQuery = `site:arxiv.org "${query}" filetype:pdf`; const searchEngine = await import('../../engines/search-engine-manager.js'); const searchResults = await searchEngine.SearchEngineManager.getInstance().search(searchQuery, { maxResults: maxResults * 2, timeout: 10000 }); if (searchResults && searchResults.results && searchResults.results.length > 0) { results = extractArxivResultsFromSearch(searchResults.html || '', query); console.log(`Found ${results.length} results from search engine fallback`); } } catch (searchError) { console.log('Search engine fallback also failed:', searchError); } } const searchTime = Date.now() - startTime; // If no results found, provide helpful error message if (results.length === 0) { return { success: false, error: 'No arXiv papers found for this query', data: { source: 'arXiv', query, results: [], totalResults: 0, searchTime, apiUsed: apiSuccess, suggestions: [ 'Try broader search terms', 'Check spelling of technical terms', 'Use different keywords or synonyms', 'Try searching without quotes' ], lastError: lastError ? (lastError instanceof Error ? lastError.message : String(lastError)) : null } }; } return { success: true, data: { source: apiSuccess ? 'arXiv API' : 'arXiv (Search Engine)', query, results: results.slice(0, maxResults), totalResults: results.length, searchTime, apiUsed: apiSuccess, fallbackUsed: !apiSuccess }, metadata: { totalResults: results.length, searchTime, sources: ['arxiv.org'], cached: false, apiSuccess, fallbackUsed: !apiSuccess } }; } catch (error) { return { success: false, error: `arXiv search failed: ${error instanceof Error ? error.message : String(error)}`, data: { source: 'arXiv', query, results: [], totalResults: 0, apiUsed: false, lastError: lastError ? (lastError instanceof Error ? lastError.message : String(lastError)) : null, suggestions: [ 'Check your internet connection', 'Try again in a few moments', 'Use different search terms', 'Contact support if the problem persists' ] } }; }
- src/tools/academic/simple.ts:19-25 (schema)Input schema defining required 'query' string and optional 'maxResults' number for the tool.inputSchema: { type: 'object', properties: { query: { type: 'string', description: 'Search query' }, maxResults: { type: 'number', description: 'Maximum results to return' } }, required: ['query']
- src/utils/input-validator.ts:119-125 (schema)Zod schema used for validating search_arxiv inputs in the global input validator, mapped at line 221.academicSearch: z.object({ query: CommonSchemas.searchQuery, limit: CommonSchemas.resultsLimit.optional().default(10), category: CommonSchemas.category.optional(), dateFrom: CommonSchemas.dateString.optional(), dateTo: CommonSchemas.dateString.optional(), }),
- src/index.ts:229-229 (registration)Calls registerAcademicTools which registers the search_arxiv tool during server initialization.registerAcademicTools(this.toolRegistry); // 1 tool: search_arxiv
- src/tools/academic/simple.ts:277-305 (helper)Helper function to parse arXiv API XML response into structured paper results.function parseArxivXML(xmlData: string): any[] { const results: any[] = []; try { // Simple XML parsing for arXiv entries const entryRegex = /<entry>(.*?)<\/entry>/gs; const entries = xmlData.match(entryRegex) || []; for (const entry of entries) { const result = { id: extractXMLValue(entry, 'id'), title: extractXMLValue(entry, 'title')?.replace(/\s+/g, ' ').trim(), summary: extractXMLValue(entry, 'summary')?.replace(/\s+/g, ' ').trim(), authors: extractAuthors(entry), published: extractXMLValue(entry, 'published'), updated: extractXMLValue(entry, 'updated'), categories: extractCategories(entry), url: extractXMLValue(entry, 'id'), pdfUrl: extractPdfUrl(entry) }; if (result.title && result.summary) { results.push(result); } } } catch (error) {} return results; }