Scholarly Research MCP Server

TROUBLESHOOTING.md•25.6 kB

# Troubleshooting and FAQ This document provides solutions to common issues, frequently asked questions, and troubleshooting steps for the Scholarly Research MCP Server. ## Common Issues and Solutions ### **Search and API Issues** #### **Issue: "No results found" for valid search queries** **Symptoms:** - Search returns empty results for common terms - No error message, just empty result set - Works for some queries but not others **Possible Causes:** 1. **Rate Limiting**: API rate limits exceeded 2. **Network Issues**: Connection problems to external services 3. **API Key Issues**: Invalid or expired API keys 4. **Service Outages**: External services temporarily unavailable 5. **Query Format**: Search query format not supported **Solutions:** ```bash # Check API key validity echo $PUBMED_API_KEY echo $JSTOR_API_KEY # Test network connectivity curl -I https://eutils.ncbi.nlm.nih.gov/entrez/eutils/ curl -I https://scholar.google.com/ # Check rate limiting status # Wait 1-2 minutes before retrying ``` **Prevention:** - Implement proper rate limiting in your requests - Use exponential backoff for retries - Monitor API usage and quotas #### **Issue: "API Error: Rate limit exceeded"** **Symptoms:** - Error message indicating rate limit exceeded - Requests fail after successful initial requests - Error occurs consistently after certain number of requests **Solutions:** ```typescript // Implement exponential backoff async function searchWithRetry(query: string, maxRetries = 3) { for (let attempt = 1; attempt <= maxRetries; attempt++) { try { return await searchAdapter.searchPapers({ query }); } catch (error) { if (error.message.includes('rate limit') && attempt < maxRetries) { const delay = Math.pow(2, attempt) * 1000; // Exponential backoff await new Promise(resolve => setTimeout(resolve, delay)); continue; } throw error; } } } ``` **Rate Limiting Guidelines:** - **PubMed**: Maximum 10 requests per second - **Google Scholar**: Built-in delays (2-5 seconds between requests) - **JSTOR**: Check API documentation for specific limits - **Firecrawl**: Professional rate limiting included #### **Issue: "Connection timeout" errors** **Symptoms:** - Requests hang for extended periods - Timeout errors after 30+ seconds - Intermittent connection failures **Possible Causes:** 1. **Slow Network**: High latency or low bandwidth 2. **Firewall Issues**: Corporate or network firewalls blocking requests 3. **DNS Issues**: Domain resolution problems 4. **Proxy Configuration**: Incorrect proxy settings **Solutions:** ```bash # Test network connectivity ping eutils.ncbi.nlm.nih.gov nslookup scholar.google.com # Check firewall settings sudo ufw status sudo iptables -L # Test with different DNS servers nslookup scholar.google.com 8.8.8.8 nslookup scholar.google.com 1.1.1.1 ``` **Configuration Adjustments:** ```typescript // Increase timeout values const searchOptions = { query: 'machine learning', timeout: 60000, // 60 seconds retryAttempts: 3 }; // Use connection pooling const adapter = new PubMedAdapter({ timeout: 30000, keepAlive: true, maxSockets: 10 }); ``` ### **Content Extraction Issues** #### **Issue: "Failed to extract full text" errors** **Symptoms:** - Paper metadata retrieved but full text extraction fails - Error messages about content extraction - Incomplete paper content **Possible Causes:** 1. **Access Restrictions**: Paper requires subscription or institutional access 2. **Format Issues**: Unsupported document format 3. **Content Changes**: Website structure changed 4. **Anti-Scraping**: Website blocking automated access **Solutions:** ```typescript // Implement fallback extraction methods async function extractContentWithFallback(pmid: string) { try { // Primary method return await extractPrimaryMethod(pmid); } catch (error) { console.warn('Primary extraction failed:', error.message); try { // Fallback method return await extractFallbackMethod(pmid); } catch (fallbackError) { console.error('All extraction methods failed:', fallbackError.message); throw new Error('Content extraction failed for all methods'); } } } // Check access availability async function checkPaperAccess(pmid: string) { const accessInfo = await getAccessInfo(pmid); if (accessInfo.requiresSubscription) { throw new Error('Paper requires institutional subscription'); } if (accessInfo.format === 'pdf-only') { return await extractFromPDF(pmid); } return await extractFromHTML(pmid); } ``` **Access Verification:** ```bash # Check if paper is publicly accessible curl -I "https://pubmed.ncbi.nlm.nih.gov/12345/" curl -I "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12345/" # Verify institutional access (if applicable) # Check your institution's proxy settings ``` #### **Issue: "Invalid or corrupted content" in extracted text** **Symptoms:** - Extracted text contains HTML tags - Text appears garbled or unreadable - Missing or incomplete sections - Duplicate content **Solutions:** ```typescript // Enhanced text cleaning function cleanExtractedText(text: string): string { return text .replace(/<[^>]*>/g, '') // Remove HTML tags .replace(/&[a-zA-Z]+;/g, ' ') // Decode HTML entities .replace(/\s+/g, ' ') // Normalize whitespace .replace(/[^\w\s\-.,;:!?()]/g, '') // Remove special characters .trim(); } // Section validation function validateSection(section: PaperSection): boolean { if (!section.content || section.content.length < 10) { return false; // Too short to be valid } if (section.content.includes('Loading...') || section.content.includes('Access Denied')) { return false; // Invalid content markers } return true; } // Content quality scoring function calculateContentQuality(text: string): number { let score = 0; // Check for meaningful content if (text.length > 100) score += 20; if (text.includes('Introduction')) score += 15; if (text.includes('Methods')) score += 15; if (text.includes('Results')) score += 15; if (text.includes('Discussion')) score += 15; if (text.includes('Conclusion')) score += 15; if (text.includes('References')) score += 5; // Penalize HTML artifacts if (text.includes('<')) score -= 10; if (text.includes('&')) score -= 5; return Math.max(0, Math.min(100, score)); } ``` ### **Google Scholar Scraping Issues** #### **Issue: "Browser automation failed" errors** **Symptoms:** - Puppeteer browser launch failures - Headless Chrome/Chromium issues - Browser crashes during scraping **Possible Causes:** 1. **Missing Dependencies**: Chrome/Chromium not installed 2. **System Resources**: Insufficient memory or CPU 3. **Permission Issues**: System security restrictions 4. **Dependency Conflicts**: Version incompatibilities **Solutions:** ```bash # Install Chrome/Chromium dependencies sudo apt-get update sudo apt-get install -y \ chromium-browser \ libxss1 \ libnss3 \ libatk-bridge2.0-0 \ libgtk-3-0 \ libgbm1 # Check system resources free -h df -h nproc # Verify Puppeteer installation npm list puppeteer npx puppeteer --version ``` **Configuration Adjustments:** ```typescript // Robust browser configuration const browserOptions = { headless: true, args: [ '--no-sandbox', '--disable-setuid-sandbox', '--disable-dev-shm-usage', '--disable-accelerated-2d-canvas', '--no-first-run', '--no-zygote', '--disable-gpu', '--disable-background-timer-throttling', '--disable-backgrounding-occluded-windows', '--disable-renderer-backgrounding', '--disable-features=TranslateUI', '--disable-ipc-flooding-protection' ], timeout: 30000, ignoreHTTPSErrors: true }; // Memory management const browser = await puppeteer.launch(browserOptions); const page = await browser.newPage(); // Set memory limits await page.setCacheEnabled(false); await page.setRequestInterception(true); page.on('request', (req) => { if (req.resourceType() === 'image' || req.resourceType() === 'stylesheet') { req.abort(); } else { req.continue(); } }); ``` #### **Issue: "Google Scholar blocking requests"** **Symptoms:** - Consistent access denied errors - CAPTCHA challenges - IP address blocking - Unusual traffic detection **Solutions:** ```typescript // Enhanced anti-detection measures async function setupStealthPage(page: Page) { // Set realistic user agent await page.setUserAgent( 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 ' + '(KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36' ); // Set viewport await page.setViewport({ width: 1920, height: 1080 }); // Set extra headers await page.setExtraHTTPHeaders({ 'Accept-Language': 'en-US,en;q=0.9', 'Accept-Encoding': 'gzip, deflate, br', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', 'DNT': '1', 'Connection': 'keep-alive', 'Upgrade-Insecure-Requests': '1' }); // Disable webdriver detection await page.evaluateOnNewDocument(() => { Object.defineProperty(navigator, 'webdriver', { get: () => undefined, }); // Remove webdriver properties delete (navigator as any).__proto__.webdriver; }); } // Implement request delays async function searchWithDelays(query: string) { const page = await browser.newPage(); await setupStealthPage(page); // Random delays between actions const randomDelay = (min: number, max: number) => Math.random() * (max - min) + min; await page.goto('https://scholar.google.com'); await page.waitForTimeout(randomDelay(1000, 3000)); await page.type('input[name="q"]', query); await page.waitForTimeout(randomDelay(500, 1500)); await page.click('input[name="btnG"]'); await page.waitForTimeout(randomDelay(2000, 5000)); // Continue with result extraction... } ``` **Alternative Solutions:** - **Use Firecrawl**: Professional web scraping service with IP rotation - **Implement Proxy Rotation**: Use multiple proxy servers - **Reduce Request Frequency**: Increase delays between requests - **Use Different User Agents**: Rotate through realistic browser configurations ### **User Preference Issues** #### **Issue: "Preferences not saved" or "Settings reset"** **Symptoms:** - User preferences revert to defaults - Configuration changes don't persist - Settings lost after restart **Possible Causes:** 1. **File Permissions**: Insufficient write permissions 2. **Directory Issues**: Preferences directory doesn't exist 3. **JSON Parsing Errors**: Corrupted preference file 4. **Disk Space**: Insufficient storage space **Solutions:** ```bash # Check file permissions ls -la ~/.mcp-scholarly-research/ chmod 755 ~/.mcp-scholarly-research/ chmod 644 ~/.mcp-scholarly-research/preferences.json # Verify directory exists mkdir -p ~/.mcp-scholarly-research/ # Check disk space df -h ~/ # Validate JSON file cat ~/.mcp-scholarly-research/preferences.json | jq . ``` **Debugging Code:** ```typescript // Enhanced preference management with error handling class RobustUserPreferencesManager { private preferencesPath: string; private preferences: UserPreferences; constructor() { this.preferencesPath = this.getPreferencesPath(); this.ensureDirectoryExists(); this.preferences = this.loadPreferences(); } private getPreferencesPath(): string { const homeDir = process.env.HOME || process.env.USERPROFILE; if (!homeDir) { throw new Error('Cannot determine home directory'); } return path.join(homeDir, '.mcp-scholarly-research', 'preferences.json'); } private ensureDirectoryExists(): void { const dir = path.dirname(this.preferencesPath); if (!fs.existsSync(dir)) { try { fs.mkdirSync(dir, { recursive: true, mode: 0o755 }); } catch (error) { console.error('Failed to create preferences directory:', error); throw new Error(`Cannot create preferences directory: ${dir}`); } } } private loadPreferences(): UserPreferences { try { if (!fs.existsSync(this.preferencesPath)) { return this.getDefaultPreferences(); } const data = fs.readFileSync(this.preferencesPath, 'utf8'); const parsed = JSON.parse(data); // Validate preferences structure if (!this.validatePreferences(parsed)) { console.warn('Invalid preferences file, using defaults'); return this.getDefaultPreferences(); } return parsed; } catch (error) { console.error('Failed to load preferences:', error); return this.getDefaultPreferences(); } } private savePreferences(): void { try { const data = JSON.stringify(this.preferences, null, 2); fs.writeFileSync(this.preferencesPath, data, { mode: 0o644 }); } catch (error) { console.error('Failed to save preferences:', error); throw new Error(`Cannot save preferences: ${error.message}`); } } private validatePreferences(prefs: any): prefs is UserPreferences { // Implement comprehensive validation return ( prefs && typeof prefs === 'object' && prefs.sources && prefs.search && prefs.display && prefs.cache ); } } ``` #### **Issue: "Invalid preference values" errors** **Symptoms:** - Error messages about invalid preference values - Preferences not applying correctly - Validation failures **Solutions:** ```typescript // Comprehensive preference validation function validateUserPreferences(prefs: any): UserPreferences { const schema = z.object({ sources: z.record(z.string(), z.object({ name: z.string(), enabled: z.boolean(), priority: z.number().int().min(1).max(100), maxResults: z.number().int().min(1).max(10000), timeout: z.number().int().min(1000).max(60000) })), search: z.object({ defaultMaxResults: z.number().int().min(1).max(1000), defaultSortBy: z.enum(['relevance', 'date', 'citations']), preferFirecrawl: z.boolean(), enableDeduplication: z.boolean() }), display: z.object({ showAbstracts: z.boolean(), showCitations: z.boolean(), maxAbstractLength: z.number().int().min(10).max(10000) }) }); try { return schema.parse(prefs); } catch (error) { console.error('Preference validation failed:', error); throw new Error('Invalid preference values detected'); } } // Safe preference updates function updatePreferencesSafely( current: UserPreferences, updates: Partial<UserPreferences> ): UserPreferences { const merged = { ...current, ...updates }; try { return validateUserPreferences(merged); } catch (error) { console.error('Preference update validation failed:', error); return current; // Return current preferences unchanged } } ``` ### **Performance and Resource Issues** #### **Issue: "High memory usage" or "Out of memory" errors** **Symptoms:** - Application consumes excessive RAM - Memory usage grows over time - Out of memory crashes - Slow performance **Solutions:** ```typescript // Memory-efficient data processing class MemoryEfficientProcessor { private maxMemoryUsage = 500 * 1024 * 1024; // 500MB async processLargeDataset<T>( data: T[], processor: (item: T) => Promise<any> ): Promise<any[]> { const results: any[] = []; const batchSize = 1000; for (let i = 0; i < data.length; i += batchSize) { const batch = data.slice(i, i + batchSize); // Process batch const batchResults = await Promise.all( batch.map(item => processor(item)) ); results.push(...batchResults); // Check memory usage if (this.shouldGarbageCollect()) { this.forceGarbageCollection(); } // Yield control to event loop await new Promise(resolve => setImmediate(resolve)); } return results; } private shouldGarbageCollect(): boolean { const usage = process.memoryUsage(); return usage.heapUsed > this.maxMemoryUsage; } private forceGarbageCollection(): void { if (global.gc) { global.gc(); } } } // Resource cleanup class ResourceManager { private resources: Set<{ close?: () => void; destroy?: () => void }> = new Set(); addResource(resource: any): void { this.resources.add(resource); } cleanup(): void { for (const resource of this.resources) { try { if (resource.close) resource.close(); if (resource.destroy) resource.destroy(); } catch (error) { console.error('Resource cleanup failed:', error); } } this.resources.clear(); } } ``` #### **Issue: "Slow search performance"** **Symptoms:** - Search requests take too long - UI becomes unresponsive - Timeout errors on complex queries **Solutions:** ```typescript // Performance optimization class OptimizedSearchAdapter { private cache = new Map<string, { data: any; timestamp: number }>(); private cacheTTL = 5 * 60 * 1000; // 5 minutes async searchPapers(options: SearchOptions): Promise<SearchResult> { const cacheKey = this.generateCacheKey(options); // Check cache first const cached = this.cache.get(cacheKey); if (cached && Date.now() - cached.timestamp < this.cacheTTL) { return cached.data; } // Perform search with timeout const searchPromise = this.performSearch(options); const timeoutPromise = new Promise((_, reject) => setTimeout(() => reject(new Error('Search timeout')), 30000) ); try { const result = await Promise.race([searchPromise, timeoutPromise]); // Cache result this.cache.set(cacheKey, { data: result, timestamp: Date.now() }); return result; } catch (error) { if (error.message === 'Search timeout') { throw new Error('Search took too long, please try a more specific query'); } throw error; } } private generateCacheKey(options: SearchOptions): string { return JSON.stringify(options); } // Implement search result streaming for large datasets async *streamSearchResults(options: SearchOptions): AsyncGenerator<SearchResult> { const batchSize = 100; let offset = 0; while (true) { const batch = await this.searchBatch(options, offset, batchSize); if (batch.length === 0) break; yield batch; offset += batchSize; // Yield control to prevent blocking await new Promise(resolve => setImmediate(resolve)); } } } ``` ## Frequently Asked Questions ### **General Usage** #### **Q: How do I get started with the tool?** **A:** The easiest way is to use the online version at [tool URL]. For local installation: ```bash # Install Node.js 18+ # Clone the repository git clone https://github.com/your-username/mcp-for-research.git cd mcp-for-research # Install dependencies npm install # Set up environment variables cp .env.example .env # Edit .env with your API keys # Start the server npm run dev ``` #### **Q: What data sources are supported?** **A:** Currently supported sources: - **PubMed**: NCBI E-utilities API (requires API key) - **Google Scholar**: Web scraping via Puppeteer or Firecrawl - **JSTOR**: API integration (requires API key) #### **Q: How do I get API keys?** **A:** - **PubMed**: Request from [NCBI](https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/) - **JSTOR**: Contact JSTOR for Data for Research access - **Firecrawl**: Sign up at [Firecrawl](https://firecrawl.dev) ### **Technical Questions** #### **Q: What are the system requirements?** **A:** - **Node.js**: 18.0.0 or higher - **Memory**: Minimum 512MB RAM (1GB recommended) - **Storage**: 2GB free disk space - **Network**: Stable internet connection - **OS**: Linux, macOS, or Windows (WSL2 recommended) #### **Q: How do I handle rate limiting?** **A:** The tool includes built-in rate limiting: - **PubMed**: 10 requests per second - **Google Scholar**: Built-in delays (2-5 seconds) - **Automatic retry**: Exponential backoff for failed requests - **Configurable**: Adjust limits in environment variables #### **Q: Can I use this in production?** **A:** Yes, but consider: - **API Quotas**: Monitor usage and implement proper rate limiting - **Error Handling**: Implement comprehensive error handling - **Monitoring**: Add logging and performance monitoring - **Scaling**: Consider horizontal scaling for high load - **Security**: Review and secure API keys and configurations ### **Troubleshooting Questions** #### **Q: Why am I getting "No results found"?** **A:** Common causes and solutions: 1. **Check API keys**: Verify they're valid and not expired 2. **Rate limiting**: Wait before retrying 3. **Network issues**: Check internet connectivity 4. **Query format**: Try simpler, more specific terms 5. **Service status**: Check if external services are down #### **Q: How do I fix "Browser automation failed"?** **A:** 1. **Install dependencies**: Ensure Chrome/Chromium is installed 2. **Check permissions**: Verify system access rights 3. **Use Firecrawl**: Switch to professional web scraping service 4. **System resources**: Ensure sufficient RAM and CPU 5. **Update Puppeteer**: Use latest version #### **Q: Preferences not saving - what should I do?** **A:** 1. **Check permissions**: Verify write access to home directory 2. **Directory creation**: Ensure `.mcp-scholarly-research` exists 3. **File validation**: Check if preferences.json is valid JSON 4. **Disk space**: Verify sufficient storage space 5. **Restart application**: Restart after configuration changes ## Getting Help ### **Self-Service Resources** #### **Documentation** - **README.md**: Quick start and basic usage - **API_REFERENCE.md**: Complete API documentation - **DEVELOPMENT.md**: Development setup and workflow - **ARCHITECTURE.md**: System architecture details #### **Logs and Debugging** ```bash # Enable debug logging export LOG_LEVEL=debug npm run dev # Check application logs tail -f logs/combined.log tail -f logs/error.log # Monitor system resources htop iotop ``` #### **Testing and Validation** ```bash # Run test suite npm run test:all-tools-bash # Check specific functionality npm run test:unit npm run test:integration # Validate configuration npm run type-check npm run lint ``` ### **Community Support** #### **GitHub Issues** - **Bug Reports**: Include error messages, logs, and reproduction steps - **Feature Requests**: Describe use case and expected behavior - **Documentation**: Report unclear or missing documentation #### **Issue Template** ```markdown ## Issue Description Brief description of the problem ## Steps to Reproduce 1. Step one 2. Step two 3. Step three ## Expected Behavior What should happen ## Actual Behavior What actually happens ## Environment - OS: [e.g., Ubuntu 20.04] - Node.js: [e.g., 18.17.0] - npm: [e.g., 9.6.7] - Tool version: [e.g., latest] ## Error Messages Include any error messages or logs ## Additional Information Any other relevant details ``` ### **Professional Support** #### **When to Seek Professional Help** - **Critical Production Issues**: System down or major functionality broken - **Complex Integration**: Custom deployments or enterprise integrations - **Performance Problems**: Significant performance degradation - **Security Concerns**: Potential security vulnerabilities #### **Support Channels** - **GitHub Issues**: For bug reports and feature requests - **Discussions**: For questions and community support - **Email**: For professional support inquiries - **Documentation**: Comprehensive guides and references ## Prevention and Best Practices ### **Regular Maintenance** #### **System Health Checks** ```bash # Daily checks npm audit npm outdated df -h free -h # Weekly checks npm run test:all-tools-bash npm run quality npm run build:all ``` #### **Monitoring and Alerting** ```typescript // Health check endpoint app.get('/health', (req, res) => { const health = { status: 'healthy', timestamp: new Date().toISOString(), uptime: process.uptime(), memory: process.memoryUsage(), version: process.version }; res.json(health); }); // Performance monitoring const performanceMonitor = new PerformanceMonitor(); performanceMonitor.startTimer('search-operation'); // ... perform operation performanceMonitor.endTimer('search-operation'); ``` ### **Configuration Management** #### **Environment-Specific Configs** ```bash # Development NODE_ENV=development LOG_LEVEL=debug DEBUG=true # Production NODE_ENV=production LOG_LEVEL=warn DEBUG=false # Testing NODE_ENV=test LOG_LEVEL=error TESTING=true ``` #### **Configuration Validation** ```typescript // Validate environment configuration function validateEnvironment(): void { const required = ['PUBMED_API_KEY', 'NODE_ENV']; const missing = required.filter(key => !process.env[key]); if (missing.length > 0) { throw new Error(`Missing required environment variables: ${missing.join(', ')}`); } // Validate API keys if (process.env.PUBMED_API_KEY && process.env.PUBMED_API_KEY.length < 10) { throw new Error('Invalid PubMed API key format'); } } ``` This comprehensive troubleshooting guide provides solutions to common issues and best practices for maintaining a healthy Scholarly Research MCP Server installation.

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/aringadre76/mcp-for-research'

If you have feedback or need assistance with the MCP directory API, please join our Discord server