Skip to main content
Glama
bradcstevens

Copilot Studio Agent Direct Line MCP Server

by bradcstevens
ERROR_HANDLING.md14.8 kB
# Error Handling and Resilience This document describes the comprehensive error handling and resilience patterns implemented in the Copilot Studio Agent Direct Line MCP Server. ## Overview The server implements a multi-layered error handling strategy: 1. **Typed Error Classes** - Specific error types for different failure scenarios 2. **Circuit Breaker Pattern** - Prevents cascading failures 3. **Retry Strategies** - Automatic retry with exponential backoff 4. **Error Transformation** - Convert errors to MCP-compatible format 5. **Graceful Degradation** - Maintain service availability during failures ## Error Types ### Base Error: `ApplicationError` All custom errors extend `ApplicationError`, which provides: - **Error Category** - Classification (AUTHENTICATION, NETWORK, etc.) - **Severity Level** - LOW, MEDIUM, HIGH, CRITICAL - **Retryability** - Whether the operation can be retried - **Retry Strategy** - Automatic retry configuration - **User Message** - User-friendly error description - **Recovery Action** - Suggested steps to resolve the error - **Metadata** - Additional context information ```typescript import { ApplicationError, ErrorCategory, ErrorSeverity } from './types/errors.js'; throw new ApplicationError('Operation failed', { category: ErrorCategory.NETWORK, severity: ErrorSeverity.MEDIUM, retryable: true, userMessage: 'Network connection issue. Retrying...', recoveryAction: 'Check network connectivity and retry', metadata: { attemptNumber: 3 } }); ``` ### Authentication Errors **AuthenticationError** (401 Unauthorized) - User authentication failed - Not retryable - Recovery: Re-authenticate using OAuth flow **AuthorizationError** (403 Forbidden) - User lacks required permissions - Not retryable - Recovery: Request appropriate permissions **OAuthError** - OAuth flow failures (invalid_grant, server_error, etc.) - Conditionally retryable based on error code - Recovery: Varies by OAuth error code **TokenRefreshError** - Token refresh failures - Retryable with exponential backoff - Recovery: Retry or re-authenticate if retries fail ### Network & Service Errors **NetworkError** - Network connectivity issues - Retryable (5 attempts, exponential backoff) - Recovery: Check network connectivity **ServiceUnavailableError** (503) - Service temporarily down - Retryable (5 attempts, up to 60s delay) - Recovery: Wait for service recovery **TimeoutError** - Request timeouts - Retryable (3 attempts, exponential backoff) - Recovery: Automatic timeout handling **RateLimitError** (429) - Rate limit exceeded - Retryable after delay specified by server - Recovery: Wait specified time before retry ### System Errors **CircuitBreakerError** - Circuit breaker is open - Not retryable (wait for recovery) - Recovery: Wait for circuit breaker recovery **ConfigurationError** - Invalid configuration - Not retryable - Severity: CRITICAL - Recovery: Fix configuration and restart **ValidationError** - Invalid input - Not retryable - Severity: LOW - Recovery: Correct input and retry ## Circuit Breaker ### Overview The Circuit Breaker pattern prevents cascading failures by "opening" when failure thresholds are exceeded. ### States 1. **CLOSED** - Normal operation, requests pass through 2. **OPEN** - Too many failures, requests fail fast 3. **HALF_OPEN** - Testing if service recovered ### Configuration ```typescript import { CircuitBreaker, FailureType } from './utils/circuit-breaker.js'; const circuitBreaker = new CircuitBreaker({ failureThreshold: 5, // Open after 5 failures failureWindow: 30000, // Within 30 seconds recoveryTimeout: 60000, // Wait 60s before testing recovery successThreshold: 3, // Close after 3 successes in HALF_OPEN excludedFailureTypes: [ // Don't count these toward threshold FailureType.AUTH_SERVICE // User auth errors don't open circuit ] }); ``` ### Failure Classification The circuit breaker classifies failures into types: - **NETWORK** - Network connectivity issues - **TIMEOUT** - Request timeouts - **SERVER_ERROR** - 5xx server errors - **AUTH_SERVICE** - OAuth/Authentication service failures - **RATE_LIMIT** - Rate limiting - **UNKNOWN** - Unclassified failures You can exclude certain failure types from opening the circuit. For example, OAuth user authentication failures (401/403) shouldn't open the circuit since they represent user errors, not service issues. ### Usage ```typescript const result = await circuitBreaker.execute(async () => { // Your operation here return await someApiCall(); }); // Get circuit breaker metrics const metrics = circuitBreaker.getMetrics(); console.log(`Circuit state: ${metrics.state}`); console.log(`Failures: ${metrics.failures}`); console.log(`Rejections: ${metrics.rejections}`); ``` ## Retry Strategies ### Basic Retry ```typescript import { retryWithBackoff } from './utils/retry.js'; const result = await retryWithBackoff( async () => await apiCall(), { maxRetries: 3, baseDelay: 1000, // 1 second maxDelay: 4000, // 4 seconds max onRetry: (attempt, error) => { console.log(`Retry attempt ${attempt}:`, error); } } ); ``` ### Advanced Retry with Strategy ```typescript import { retryWithStrategy } from './utils/retry.js'; const result = await retryWithStrategy( async () => await apiCall(), { strategy: { maxRetries: 5, initialDelay: 1000, maxDelay: 30000, backoffMultiplier: 2, // Exponential: 1s, 2s, 4s, 8s, 16s retryableStatuses: [408, 429, 500, 502, 503, 504] }, onRetry: (attempt, error, delay) => { console.log(`Retry ${attempt} after ${delay}ms`); }, shouldRetry: (error) => { // Custom retry logic return error.name !== 'AuthenticationError'; } } ); ``` ### OAuth-Specific Retry ```typescript import { retryOAuthOperation } from './utils/retry.js'; const tokens = await retryOAuthOperation( async () => await tokenExchange(code), { onRetry: (attempt, error, delay) => { console.warn(`Token exchange retry ${attempt}, waiting ${delay}ms`); } } ); ``` This automatically: - Retries only server errors (500-504), not auth errors (401, 403) - Skips retry for `invalid_grant` (user must re-authenticate) - Uses exponential backoff with jitter (1s → 2s → 4s → 8s) ### Retry with Circuit Breaker ```typescript import { retryWithCircuitBreaker } from './utils/retry.js'; const result = await retryWithCircuitBreaker( async () => await apiCall(), { strategy: { maxRetries: 3 } }, () => circuitBreaker.getState() !== CircuitState.OPEN ); ``` ### Retry with Deadline ```typescript import { retryWithDeadline } from './utils/retry.js'; const result = await retryWithDeadline( async () => await apiCall(), 10000, // 10 second deadline { strategy: { maxRetries: 10 } // But stop at deadline } ); ``` ## Error Transformation for MCP ### Transform to MCP Format ```typescript import { transformToMCPError } from './utils/error-transformer.js'; try { await someOperation(); } catch (error) { const mcpError = transformToMCPError(error, { operationId: 'send_message', conversationId: 'abc123' }); // Send to MCP client return { error: mcpError }; } ``` ### OAuth Error Transformation ```typescript import { transformOAuthError } from './utils/error-transformer.js'; try { await tokenExchange(code); } catch (error) { const mcpError = transformOAuthError(error, { flow: 'token_exchange', step: 'code_exchange' }); return { error: mcpError }; } ``` ### MCP Error Format ```json { "code": "AUTHENTICATION", "message": "Authentication failed. Please log in again.", "data": { "severity": "HIGH", "retryable": false, "recoveryAction": "Re-authenticate using the OAuth flow", "timestamp": "2025-01-07T12:00:00.000Z", "flow": "token_exchange", "step": "code_exchange" } } ``` ## Implementation Examples ### EntraID Client with Full Error Handling ```typescript import { CircuitBreaker, FailureType } from './utils/circuit-breaker.js'; import { retryOAuthOperation } from './utils/retry.js'; import { OAuthError, TokenRefreshError } from './types/errors.js'; class EntraIDClient { private circuitBreaker: CircuitBreaker; constructor() { // Circuit breaker excludes user auth errors this.circuitBreaker = new CircuitBreaker({ failureThreshold: 5, failureWindow: 30000, recoveryTimeout: 60000, successThreshold: 3, excludedFailureTypes: [FailureType.AUTH_SERVICE] }); } async refreshAccessToken(refreshToken: string) { return this.circuitBreaker.execute(async () => { return retryOAuthOperation( async () => { try { const response = await this.msalClient.acquireTokenByRefreshToken({ refreshToken, scopes: this.config.scopes }); if (!response) { throw new TokenRefreshError('Token refresh returned empty response'); } return response; } catch (error) { // Check for invalid_grant - requires re-authentication if (error.message.includes('invalid_grant')) { throw new TokenRefreshError( 'Refresh token is invalid or expired. User must re-authenticate.', { requiresReauth: true } ); } throw new TokenRefreshError( error.message || 'Token refresh failed', { originalError: error.message } ); } }, { onRetry: (attempt, error, delay) => { console.warn(`Token refresh retry ${attempt}, waiting ${delay}ms`); } } ); }); } } ``` ### Health Check with Circuit Breaker Monitoring ```typescript import { HealthCheckService } from './services/health-check.js'; const healthCheck = new HealthCheckService(); // Include circuit breaker state in health checks const result = await healthCheck.check(); console.log('Health:', result.status); console.log('Circuit Breakers:', result.circuitBreakers); ``` ## Best Practices ### 1. Use Specific Error Types Always throw the most specific error type: ```typescript // ✅ Good throw new TokenRefreshError('Token expired'); // ❌ Bad throw new Error('Token expired'); ``` ### 2. Include Context in Errors Provide helpful metadata: ```typescript throw new OAuthError( 'Authorization failed', 'invalid_grant', 'Refresh token expired', { flow: 'token_refresh', scopes: ['User.Read'], tenantId: 'abc-123' } ); ``` ### 3. Don't Retry User Errors User errors (401, 403, validation) should not be retried: ```typescript // Circuit breaker excludes user auth failures excludedFailureTypes: [FailureType.AUTH_SERVICE] // OAuth retry skips auth errors automatically await retryOAuthOperation(fn); // Won't retry 401/403 ``` ### 4. Use Circuit Breakers for External Services Wrap all external API calls: ```typescript const authCircuit = new CircuitBreaker({ /* config */ }); const botCircuit = new CircuitBreaker({ /* config */ }); await authCircuit.execute(() => oauthCall()); await botCircuit.execute(() => botFrameworkCall()); ``` ### 5. Transform Errors at API Boundaries Convert to MCP format at the boundary: ```typescript // In MCP tool handler export async function handleTool(request) { try { return await processRequest(request); } catch (error) { return { error: transformToMCPError(error, { toolName: request.name, requestId: request.id }) }; } } ``` ### 6. Monitor Circuit Breaker State Include in health checks and monitoring: ```typescript const metrics = circuitBreaker.getMetrics(); if (metrics.state === CircuitState.OPEN) { console.error('Circuit breaker open!', metrics); // Alert monitoring system } ``` ## Error Flow Diagram ``` Request → Circuit Breaker (Check State) ↓ CLOSED? ↓ Execute with Retry ↓ Success? ─────────→ Return Result ↓ Error ↓ Classify Failure Type ↓ Should Count? ───→ Increment Failure Count ↓ Should Retry? ↓ YES ──→ Exponential Backoff → Retry ↓ NO ↓ Transform to MCP Error ↓ Return Error Response ``` ## Monitoring and Observability ### Circuit Breaker Metrics ```typescript const metrics = circuitBreaker.getMetrics(); // { // state: 'CLOSED', // failures: 2, // successes: 10, // rejections: 0, // lastFailureTime: 1704628800000, // lastStateChange: 1704628800000 // } ``` ### Error Context All errors include: - Timestamp - Severity level - Retry strategy - Recovery action - Relevant metadata ### Health Checks The `/health` endpoint includes circuit breaker state: ```json { "status": "healthy", "timestamp": "2025-01-07T12:00:00.000Z", "uptime": 3600000, "version": "1.0.5", "circuitBreakers": { "oauth": { "state": "CLOSED", "metrics": { "failures": 0, "successes": 42 } } } } ``` ## Testing Error Scenarios ### Simulating Failures ```typescript // Test circuit breaker opening for (let i = 0; i < 6; i++) { try { await circuitBreaker.execute(async () => { throw new Error('Simulated failure'); }); } catch (error) { // Circuit should open after 5 failures } } const metrics = circuitBreaker.getMetrics(); assert(metrics.state === CircuitState.OPEN); ``` ### Testing Retry Logic ```typescript let attempts = 0; const result = await retryWithStrategy( async () => { attempts++; if (attempts < 3) { throw new NetworkError('Simulated network error'); } return 'success'; }, { strategy: { maxRetries: 5 } } ); assert(result === 'success'); assert(attempts === 3); ``` ## Conclusion This comprehensive error handling system provides: - **Resilience** - Automatic recovery from transient failures - **Clarity** - Clear error messages and recovery actions - **Control** - Fine-grained retry and circuit breaker configuration - **Observability** - Detailed metrics and monitoring - **Consistency** - Standardized error format across the application For questions or issues, refer to the implementation in: - `src/types/errors.ts` - Error type definitions - `src/utils/circuit-breaker.ts` - Circuit breaker implementation - `src/utils/retry.ts` - Retry strategies - `src/utils/error-transformer.ts` - MCP error transformation - `src/services/entraid-client.ts` - Example usage

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/bradcstevens/copilot-studio-agent-direct-line-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server