MCP Meetup-Claude Integration Server

004-error-handling-strategy.md•8.6 KiB

# ADR-004: Error Handling and Resilience Strategy ## Status Accepted ## Context The MCP server integrates multiple external services (Meetup.com APIs, Anthropic Claude API) and performs complex data processing. We need a comprehensive error handling strategy that ensures reliability, provides meaningful feedback to users, and maintains system stability under various failure conditions. ## Decision We will implement a multi-layered error handling strategy with graceful degradation, comprehensive logging, and user-friendly error messages. ## Error Categories and Handling ### 1. Authentication Errors **Scenarios:** - Missing or invalid API credentials - Expired access tokens - OAuth flow failures **Handling Strategy:** ```python try: headers = self.auth_manager.get_auth_headers() except Exception as e: logging.error(f"Authentication error: {e}") return [TextContent( type="text", text="Authentication required. Please use get_oauth_url tool to set up authentication." )] ``` ### 2. API Communication Errors **Scenarios:** - Network timeouts - HTTP error responses (4xx, 5xx) - Rate limiting - Service unavailability **Handling Strategy:** ```python try: async with self.session.get(url, params=params, headers=headers) as response: if response.status == 200: # Process successful response elif response.status == 429: logging.warning("Rate limited - suggest retry") else: error_text = await response.text() logging.error(f"API error {response.status}: {error_text}") except aiohttp.ClientError as e: logging.error(f"Network error: {e}") # Fallback to alternative API or cached data ``` ### 3. Data Processing Errors **Scenarios:** - Malformed API responses - Missing required fields - Invalid data types - Parsing failures **Handling Strategy:** ```python for event_data in events_data: try: event = self._parse_rest_event(event_data) events.append(event) except Exception as e: logging.warning(f"Error parsing event {event_data.get('id', 'unknown')}: {e}") continue # Skip malformed event, continue processing others ``` ### 4. LLM Integration Errors **Scenarios:** - Claude API failures - Token limit exceeded - Content policy violations - Service quotas exceeded **Handling Strategy:** ```python try: response = await self.claude_integration.generate_response(prompt, max_tokens=2000) return [TextContent(type="text", text=response)] except Exception as e: logging.error(f"Claude integration error: {e}") return [TextContent( type="text", text="Unable to generate AI recommendations. Here are the raw events found: [event list]" )] ``` ## Resilience Patterns ### 1. Graceful Degradation - **Primary API Failure**: Fall back to secondary API (REST → GraphQL) - **Claude API Failure**: Return structured event data without AI enhancement - **Partial Data**: Continue processing with available data, log missing fields ### 2. Circuit Breaker Pattern ```python class APICircuitBreaker: def __init__(self, failure_threshold=5, recovery_timeout=60): self.failure_count = 0 self.failure_threshold = failure_threshold self.recovery_timeout = recovery_timeout self.last_failure_time = None self.state = "CLOSED" # CLOSED, OPEN, HALF_OPEN ``` ### 3. Retry Logic ```python async def api_call_with_retry(self, url, params, max_retries=3): for attempt in range(max_retries): try: return await self._make_api_call(url, params) except TransientError as e: if attempt == max_retries - 1: raise await asyncio.sleep(2 ** attempt) # Exponential backoff ``` ### 4. Timeout Management ```python async with aiohttp.ClientSession(timeout=aiohttp.ClientTimeout(total=30)) as session: # API calls with timeout ``` ## Logging Strategy ### Log Levels and Usage: - **ERROR**: Authentication failures, API errors, system failures - **WARNING**: Parsing errors, fallback usage, rate limiting - **INFO**: Successful operations, server lifecycle events - **DEBUG**: Detailed request/response data, parameter extraction details ### Structured Logging: ```python logging.error("API request failed", extra={ "api_endpoint": url, "status_code": response.status, "user_query": query_text, "error_type": "http_error" }) ``` ### Log Sanitization: - Remove sensitive data (API keys, personal information) - Truncate large payloads - Hash user identifiers for privacy ## User-Facing Error Messages ### Principles: 1. **Actionable**: Tell users what they can do to fix the issue 2. **Clear**: Avoid technical jargon 3. **Helpful**: Provide alternative suggestions when possible 4. **Consistent**: Use standardized message formats ### Error Message Templates: ```python ERROR_MESSAGES = { "auth_required": "Authentication required. Please use the get_oauth_url tool to set up your Meetup.com credentials.", "no_events_found": "No events found matching '{query}'. Try different keywords, location, or time range.", "api_unavailable": "Meetup.com service is temporarily unavailable. Please try again later.", "invalid_query": "Unable to understand query '{query}'. Try using clearer time references (today, tomorrow) and location (near me, in [city])." } ``` ## Monitoring and Alerting ### Key Metrics: - API success/failure rates - Response times - Error categorization and frequency - User query success rates - Token usage and limits ### Health Checks: ```python async def health_check(self): """Perform system health check.""" checks = { "auth_status": await self._check_authentication(), "meetup_api": await self._check_meetup_api(), "claude_api": await self._check_claude_api(), "config_valid": Config.validate() } return checks ``` ## Error Recovery Strategies ### 1. Automatic Recovery: - Token refresh on authentication errors - API endpoint switching on service errors - Retry with backoff on transient errors ### 2. Manual Recovery Guidance: - Clear instructions for credential setup - Troubleshooting guides in documentation - Self-service diagnostic tools ### 3. Fallback Data Sources: - Cached event data for offline operation - Default recommendations when personalization fails - Static content when dynamic content unavailable ## Testing Strategy ### Error Simulation: ```python @pytest.mark.asyncio async def test_api_failure_handling(): """Test handling of API failures.""" with mock.patch('aiohttp.ClientSession.get') as mock_get: mock_get.side_effect = aiohttp.ClientError("Network error") result = await server._handle_search_events({"query": "test"}) assert "error" in result[0].text.lower() assert len(result) == 1 ``` ### Error Injection: - Network failure simulation - API response corruption - Authentication token expiration - Rate limiting scenarios ## Benefits ### For Users: - Consistent experience even during service issues - Clear guidance on resolving problems - Minimal service interruption ### For Developers: - Comprehensive error visibility - Easy debugging and troubleshooting - Predictable error handling patterns ### For Operations: - Proactive issue detection - Clear escalation paths - Automated recovery where possible ## Implementation Guidelines ### Exception Hierarchy: ```python class MeetupServerError(Exception): """Base exception for MCP server errors.""" pass class AuthenticationError(MeetupServerError): """Authentication-related errors.""" pass class APIError(MeetupServerError): """External API communication errors.""" pass class DataProcessingError(MeetupServerError): """Data parsing and processing errors.""" pass ``` ### Error Context Preservation: ```python try: events = await self.search_events_rest(query) except APIError as e: logging.error(f"REST API failed: {e}", exc_info=True) # Preserve context for fallback events = await self.search_events_graphql(query) ``` ## Success Metrics - **Availability**: > 99% uptime for core functionality - **Error Recovery**: < 5% of errors require manual intervention - **User Experience**: < 2% of queries result in unhelpful error messages - **Mean Time to Recovery**: < 5 minutes for transient issues ## Future Enhancements - **Distributed Tracing**: For complex error scenarios across services - **Predictive Error Detection**: ML-based anomaly detection - **Self-Healing**: Automated recovery and configuration adjustment - **Error Analytics**: Trend analysis and proactive issue prevention --- **Date**: 2025-07-27 **Author**: Dan Shields **Dependencies**: All previous ADRs

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/d4nshields/mcp-meetup'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

004-error-handling-strategy.md•8.6 KiB