Skip to main content
Glama
flyanima

Open Search MCP

by flyanima

search_arxiv

Find academic papers on arXiv by entering a search query. Customize results by specifying the maximum number of papers to retrieve. Ideal for researchers and students.

Instructions

Search arXiv for academic papers

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
maxResultsNoMaximum results to return
queryYesSearch query

Implementation Reference

  • Registers the 'search_arxiv' tool in the ToolRegistry, including name, description, schema, and execute handler.
    registry.registerTool({ name: 'search_arxiv', description: 'Search arXiv for academic papers', category: 'academic', source: 'arxiv.org', inputSchema: { type: 'object', properties: { query: { type: 'string', description: 'Search query' }, maxResults: { type: 'number', description: 'Maximum results to return' } }, required: ['query'] }, execute: async (args: ToolInput): Promise<ToolOutput> => { const query = args.query || ''; const maxResults = Math.min(args.maxResults || 10, 50); // Limit to 50 results // Declare lastError at function scope let lastError: any = null; try { const startTime = Date.now(); // Try arXiv API with enhanced retry mechanism let results = []; let apiSuccess = false; // Try multiple endpoints with different configurations const apiConfigs = [ { url: 'https://export.arxiv.org/api/query', timeout: 20000, headers: { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36', 'Accept': 'application/atom+xml' } }, { url: 'http://export.arxiv.org/api/query', timeout: 15000, headers: { 'User-Agent': 'Open-Search-MCP/1.0', 'Accept': 'application/atom+xml' } }, { url: 'https://arxiv.org/api/query', timeout: 10000, headers: { 'User-Agent': 'Open-Search-MCP/1.0', 'Accept': 'application/atom+xml' } } ]; for (const config of apiConfigs) { for (let attempt = 0; attempt < 3; attempt++) { try { const params = { search_query: `all:${encodeURIComponent(query)}`, start: 0, max_results: maxResults, sortBy: 'relevance', sortOrder: 'descending' }; const response = await axios.get(config.url, { params, timeout: config.timeout, headers: config.headers, maxRedirects: 5, validateStatus: (status) => status < 500 // Accept 4xx but retry on 5xx }); if (response.status === 200 && response.data) { // Parse XML response const xmlData = response.data; results = parseArxivXML(xmlData); if (results.length > 0) { apiSuccess = true; break; } } } catch (apiError) { lastError = apiError; // Wait before retry if (attempt < 2) { await new Promise(resolve => setTimeout(resolve, 1000 * (attempt + 1))); } } } if (apiSuccess) break; } // If API fails, try search engine as fallback if (!apiSuccess || results.length === 0) { try { console.log('arXiv API failed, trying search engine fallback...'); const searchQuery = `site:arxiv.org "${query}" filetype:pdf`; const searchEngine = await import('../../engines/search-engine-manager.js'); const searchResults = await searchEngine.SearchEngineManager.getInstance().search(searchQuery, { maxResults: maxResults * 2, timeout: 10000 }); if (searchResults && searchResults.results && searchResults.results.length > 0) { results = extractArxivResultsFromSearch(searchResults.html || '', query); console.log(`Found ${results.length} results from search engine fallback`); } } catch (searchError) { console.log('Search engine fallback also failed:', searchError); } } const searchTime = Date.now() - startTime; // If no results found, provide helpful error message if (results.length === 0) { return { success: false, error: 'No arXiv papers found for this query', data: { source: 'arXiv', query, results: [], totalResults: 0, searchTime, apiUsed: apiSuccess, suggestions: [ 'Try broader search terms', 'Check spelling of technical terms', 'Use different keywords or synonyms', 'Try searching without quotes' ], lastError: lastError ? (lastError instanceof Error ? lastError.message : String(lastError)) : null } }; } return { success: true, data: { source: apiSuccess ? 'arXiv API' : 'arXiv (Search Engine)', query, results: results.slice(0, maxResults), totalResults: results.length, searchTime, apiUsed: apiSuccess, fallbackUsed: !apiSuccess }, metadata: { totalResults: results.length, searchTime, sources: ['arxiv.org'], cached: false, apiSuccess, fallbackUsed: !apiSuccess } }; } catch (error) { return { success: false, error: `arXiv search failed: ${error instanceof Error ? error.message : String(error)}`, data: { source: 'arXiv', query, results: [], totalResults: 0, apiUsed: false, lastError: lastError ? (lastError instanceof Error ? lastError.message : String(lastError)) : null, suggestions: [ 'Check your internet connection', 'Try again in a few moments', 'Use different search terms', 'Contact support if the problem persists' ] } }; } } });
  • Core execution logic for search_arxiv: queries arXiv API with multiple endpoints and retries, parses XML results, falls back to general search engine if API fails, handles errors gracefully.
    execute: async (args: ToolInput): Promise<ToolOutput> => { const query = args.query || ''; const maxResults = Math.min(args.maxResults || 10, 50); // Limit to 50 results // Declare lastError at function scope let lastError: any = null; try { const startTime = Date.now(); // Try arXiv API with enhanced retry mechanism let results = []; let apiSuccess = false; // Try multiple endpoints with different configurations const apiConfigs = [ { url: 'https://export.arxiv.org/api/query', timeout: 20000, headers: { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36', 'Accept': 'application/atom+xml' } }, { url: 'http://export.arxiv.org/api/query', timeout: 15000, headers: { 'User-Agent': 'Open-Search-MCP/1.0', 'Accept': 'application/atom+xml' } }, { url: 'https://arxiv.org/api/query', timeout: 10000, headers: { 'User-Agent': 'Open-Search-MCP/1.0', 'Accept': 'application/atom+xml' } } ]; for (const config of apiConfigs) { for (let attempt = 0; attempt < 3; attempt++) { try { const params = { search_query: `all:${encodeURIComponent(query)}`, start: 0, max_results: maxResults, sortBy: 'relevance', sortOrder: 'descending' }; const response = await axios.get(config.url, { params, timeout: config.timeout, headers: config.headers, maxRedirects: 5, validateStatus: (status) => status < 500 // Accept 4xx but retry on 5xx }); if (response.status === 200 && response.data) { // Parse XML response const xmlData = response.data; results = parseArxivXML(xmlData); if (results.length > 0) { apiSuccess = true; break; } } } catch (apiError) { lastError = apiError; // Wait before retry if (attempt < 2) { await new Promise(resolve => setTimeout(resolve, 1000 * (attempt + 1))); } } } if (apiSuccess) break; } // If API fails, try search engine as fallback if (!apiSuccess || results.length === 0) { try { console.log('arXiv API failed, trying search engine fallback...'); const searchQuery = `site:arxiv.org "${query}" filetype:pdf`; const searchEngine = await import('../../engines/search-engine-manager.js'); const searchResults = await searchEngine.SearchEngineManager.getInstance().search(searchQuery, { maxResults: maxResults * 2, timeout: 10000 }); if (searchResults && searchResults.results && searchResults.results.length > 0) { results = extractArxivResultsFromSearch(searchResults.html || '', query); console.log(`Found ${results.length} results from search engine fallback`); } } catch (searchError) { console.log('Search engine fallback also failed:', searchError); } } const searchTime = Date.now() - startTime; // If no results found, provide helpful error message if (results.length === 0) { return { success: false, error: 'No arXiv papers found for this query', data: { source: 'arXiv', query, results: [], totalResults: 0, searchTime, apiUsed: apiSuccess, suggestions: [ 'Try broader search terms', 'Check spelling of technical terms', 'Use different keywords or synonyms', 'Try searching without quotes' ], lastError: lastError ? (lastError instanceof Error ? lastError.message : String(lastError)) : null } }; } return { success: true, data: { source: apiSuccess ? 'arXiv API' : 'arXiv (Search Engine)', query, results: results.slice(0, maxResults), totalResults: results.length, searchTime, apiUsed: apiSuccess, fallbackUsed: !apiSuccess }, metadata: { totalResults: results.length, searchTime, sources: ['arxiv.org'], cached: false, apiSuccess, fallbackUsed: !apiSuccess } }; } catch (error) { return { success: false, error: `arXiv search failed: ${error instanceof Error ? error.message : String(error)}`, data: { source: 'arXiv', query, results: [], totalResults: 0, apiUsed: false, lastError: lastError ? (lastError instanceof Error ? lastError.message : String(lastError)) : null, suggestions: [ 'Check your internet connection', 'Try again in a few moments', 'Use different search terms', 'Contact support if the problem persists' ] } }; }
  • Input schema defining required 'query' string and optional 'maxResults' number for the tool.
    inputSchema: { type: 'object', properties: { query: { type: 'string', description: 'Search query' }, maxResults: { type: 'number', description: 'Maximum results to return' } }, required: ['query']
  • Zod schema used for validating search_arxiv inputs in the global input validator, mapped at line 221.
    academicSearch: z.object({ query: CommonSchemas.searchQuery, limit: CommonSchemas.resultsLimit.optional().default(10), category: CommonSchemas.category.optional(), dateFrom: CommonSchemas.dateString.optional(), dateTo: CommonSchemas.dateString.optional(), }),
  • src/index.ts:229-229 (registration)
    Calls registerAcademicTools which registers the search_arxiv tool during server initialization.
    registerAcademicTools(this.toolRegistry); // 1 tool: search_arxiv
  • Helper function to parse arXiv API XML response into structured paper results.
    function parseArxivXML(xmlData: string): any[] { const results: any[] = []; try { // Simple XML parsing for arXiv entries const entryRegex = /<entry>(.*?)<\/entry>/gs; const entries = xmlData.match(entryRegex) || []; for (const entry of entries) { const result = { id: extractXMLValue(entry, 'id'), title: extractXMLValue(entry, 'title')?.replace(/\s+/g, ' ').trim(), summary: extractXMLValue(entry, 'summary')?.replace(/\s+/g, ' ').trim(), authors: extractAuthors(entry), published: extractXMLValue(entry, 'published'), updated: extractXMLValue(entry, 'updated'), categories: extractCategories(entry), url: extractXMLValue(entry, 'id'), pdfUrl: extractPdfUrl(entry) }; if (result.title && result.summary) { results.push(result); } } } catch (error) {} return results; }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/flyanima/open-search-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server