Skip to main content
Glama
llm_fallback_flow.md14.8 kB
# LLM Fallback System - Detailed Flow ## LLM Fallback Flow Diagram ```mermaid flowchart TD Start([LLM Request]) --> CheckCache{Provider<br/>Health Cache} CheckCache -->|Last Success<br/>< 5 min ago| UseCached[Use Last<br/>Successful Provider] CheckCache -->|No Cache| StartFallback[Start Fallback<br/>Chain] UseCached --> TryProvider1 StartFallback --> TryProvider1 TryProvider1[Try Provider 1:<br/>Euri] --> CheckCB1{Circuit<br/>Breaker<br/>Open?} CheckCB1 -->|Closed| CheckRL1{Rate<br/>Limit<br/>OK?} CheckCB1 -->|Open| Skip1[Skip Euri] CheckRL1 -->|OK| CallEuri[Call Euri API] CheckRL1 -->|Limited| Skip1 CallEuri --> EuriResponse{Success?} EuriResponse -->|✓ Success| RecordSuccess1[Record Success<br/>Close Circuit<br/>Update Cache] EuriResponse -->|✗ Timeout| RecordFailure1[Record Failure<br/>Increment Counter] EuriResponse -->|✗ Error| RecordFailure1 RecordSuccess1 --> Return1([Return Response]) RecordFailure1 --> CheckThreshold1{Failure<br/>Threshold<br/>Reached?} CheckThreshold1 -->|Yes| OpenCircuit1[Open Circuit<br/>Breaker] CheckThreshold1 -->|No| NextProvider1 OpenCircuit1 --> NextProvider1[Try Provider 2:<br/>Deepseek] Skip1 --> NextProvider1 NextProvider1 --> CheckCB2{Circuit<br/>Breaker<br/>Open?} CheckCB2 -->|Closed| CheckRL2{Rate<br/>Limit<br/>OK?} CheckCB2 -->|Open| Skip2[Skip Deepseek] CheckRL2 -->|OK| CallDeepseek[Call Deepseek API] CheckRL2 -->|Limited| Skip2 CallDeepseek --> DeepseekResponse{Success?} DeepseekResponse -->|✓ Success| RecordSuccess2[Record Success<br/>Close Circuit<br/>Update Cache] DeepseekResponse -->|✗ Failure| RecordFailure2[Record Failure] RecordSuccess2 --> Return2([Return Response]) RecordFailure2 --> NextProvider2[Try Provider 3:<br/>Gemini] Skip2 --> NextProvider2 NextProvider2 --> CheckCB3{Circuit<br/>Breaker<br/>Open?} CheckCB3 -->|Closed| CheckRL3{Rate<br/>Limit<br/>OK?} CheckCB3 -->|Open| Skip3[Skip Gemini] CheckRL3 -->|OK| CallGemini[Call Gemini API] CheckRL3 -->|Limited| Skip3 CallGemini --> GeminiResponse{Success?} GeminiResponse -->|✓ Success| RecordSuccess3[Record Success<br/>Close Circuit<br/>Update Cache] GeminiResponse -->|✗ Failure| RecordFailure3[Record Failure] RecordSuccess3 --> Return3([Return Response]) RecordFailure3 --> NextProvider3[Try Provider 4:<br/>Claude] Skip3 --> NextProvider3 NextProvider3 --> CheckCB4{Circuit<br/>Breaker<br/>Open?} CheckCB4 -->|Closed| CheckRL4{Rate<br/>Limit<br/>OK?} CheckCB4 -->|Open| AllFailed[All Providers<br/>Unavailable] CheckRL4 -->|OK| CallClaude[Call Claude API] CheckRL4 -->|Limited| AllFailed CallClaude --> ClaudeResponse{Success?} ClaudeResponse -->|✓ Success| RecordSuccess4[Record Success<br/>Close Circuit<br/>Update Cache] ClaudeResponse -->|✗ Failure| RecordFailure4[Record Failure] RecordSuccess4 --> Return4([Return Response]) RecordFailure4 --> AllFailed AllFailed --> LogError[Log Error<br/>Send Alert] LogError --> ThrowError([Throw Exception]) style Start fill:#667eea,color:#fff style Return1 fill:#10b981,color:#fff style Return2 fill:#10b981,color:#fff style Return3 fill:#10b981,color:#fff style Return4 fill:#10b981,color:#fff style ThrowError fill:#ef4444,color:#fff style CallEuri fill:#f59e0b,color:#fff style CallDeepseek fill:#f59e0b,color:#fff style CallGemini fill:#f59e0b,color:#fff style CallClaude fill:#f59e0b,color:#fff ``` ## Circuit Breaker State Machine ```mermaid stateDiagram-v2 [*] --> Closed Closed --> Open : Failure threshold reached<br/>(5 consecutive failures) Open --> HalfOpen : Timeout elapsed<br/>(60 seconds) HalfOpen --> Closed : Success threshold reached<br/>(3 consecutive successes) HalfOpen --> Open : Any failure state Closed { [*] --> Monitoring Monitoring --> CountingFailures : Request Failed CountingFailures --> Monitoring : Request Succeeded } state Open { [*] --> Blocking Blocking --> WaitingForTimeout : Waiting... WaitingForTimeout --> [*] : Timeout Reached } state HalfOpen { [*] --> Testing Testing --> CountingSuccesses : Request Succeeded CountingSuccesses --> Testing : Next Request } note right of Closed Normal Operation All requests pass through Failure count tracked end note note right of Open Blocking Requests Provider marked unavailable Wait for timeout end note note right of HalfOpen Testing Recovery Limited requests allowed Evaluating health end note ``` ## Rate Limiter Token Bucket ```mermaid flowchart LR subgraph "Token Bucket Algorithm" Bucket[(Token Bucket<br/>Max: 100 tokens<br/>Current: 75 tokens)] Refill[Refill Rate:<br/>100 tokens/min<br/>≈ 1.67 tokens/sec] end Request[Incoming Request] --> CheckTokens{Tokens<br/>Available?} CheckTokens -->|Yes<br/>Tokens ≥ 1| TakeToken[Take 1 Token] CheckTokens -->|No<br/>Tokens < 1| RateLimited[Rate Limited<br/>Return 429] TakeToken --> ProcessRequest[Process Request] ProcessRequest --> Success([Success]) RateLimited --> Wait{Wait for<br/>Token?} Wait -->|Yes| WaitForRefill[Wait for Refill] Wait -->|No| Reject([Reject Request]) WaitForRefill --> CheckTokens Refill -.->|Add tokens<br/>continuously| Bucket style Bucket fill:#10b981,color:#fff style Success fill:#10b981,color:#fff style RateLimited fill:#f59e0b,color:#fff style Reject fill:#ef4444,color:#fff ``` ## LLM Provider Health Check ```mermaid sequenceDiagram participant Scheduler participant HealthCheck participant Provider1 as Euri participant Provider2 as Deepseek participant Provider3 as Gemini participant Provider4 as Claude participant Monitor loop Every 5 minutes Scheduler->>HealthCheck: Run Health Check par Check All Providers HealthCheck->>Provider1: Send Test Request Provider1-->>HealthCheck: Response / Timeout HealthCheck->>Monitor: Update Euri Status and HealthCheck->>Provider2: Send Test Request Provider2-->>HealthCheck: Response / Timeout HealthCheck->>Monitor: Update Deepseek Status and HealthCheck->>Provider3: Send Test Request Provider3-->>HealthCheck: Response / Timeout HealthCheck->>Monitor: Update Gemini Status and HealthCheck->>Provider4: Send Test Request Provider4-->>HealthCheck: Response / Timeout HealthCheck->>Monitor: Update Claude Status end Monitor->>Monitor: Calculate Success Rates Monitor->>Monitor: Update Provider Rankings Monitor->>Monitor: Check for Alerts alt All Providers Failing Monitor->>Scheduler: Send Alert end end ``` ## Cost Tracking Flow ```mermaid flowchart TD Request[LLM Request] --> SelectProvider[Select Provider] SelectProvider --> SendRequest[Send Request to Provider] SendRequest --> Response[Receive Response] Response --> ExtractUsage[Extract Token Usage] ExtractUsage --> CalcCost[Calculate Cost] CalcCost --> UpdateMetrics[Update Metrics] subgraph "Cost Calculation" CalcCost --> GetRate[Get Provider Rate] GetRate --> Multiply[Cost = Tokens × Rate] end subgraph "Metrics Update" UpdateMetrics --> ProviderCost[Provider Total Cost] UpdateMetrics --> TotalCost[System Total Cost] UpdateMetrics --> RequestCount[Request Count] end UpdateMetrics --> SaveToDB[(Save to Database)] SaveToDB --> CheckBudget{Budget<br/>Exceeded?} CheckBudget -->|No| Continue([Continue]) CheckBudget -->|Yes| Alert[Send Budget Alert] Alert --> Continue style CalcCost fill:#f59e0b,color:#fff style SaveToDB fill:#10b981,color:#fff style Alert fill:#ef4444,color:#fff ``` ## Retry Logic with Exponential Backoff ```mermaid flowchart TD Start([API Request]) --> Attempt1[Attempt 1] Attempt1 --> Check1{Success?} Check1 -->|✓| Success([Return Result]) Check1 -->|✗| Wait1[Wait 1 second] Wait1 --> Attempt2[Attempt 2] Attempt2 --> Check2{Success?} Check2 -->|✓| Success Check2 -->|✗| Wait2[Wait 2 seconds<br/>Exponential: 1×2] Wait2 --> Attempt3[Attempt 3] Attempt3 --> Check3{Success?} Check3 -->|✓| Success Check3 -->|✗| MaxRetries[Max Retries Reached] MaxRetries --> Fallback{Fallback<br/>Available?} Fallback -->|Yes| NextProvider[Try Next Provider] Fallback -->|No| Failed([All Attempts Failed]) NextProvider --> Start style Success fill:#10b981,color:#fff style Failed fill:#ef4444,color:#fff style Wait1 fill:#f59e0b,color:#fff style Wait2 fill:#f59e0b,color:#fff ``` ## LLM Request Lifecycle ```mermaid sequenceDiagram participant App as Application participant LLM as LLM Manager participant Cache as Provider Cache participant CB as Circuit Breaker participant RL as Rate Limiter participant Provider as LLM Provider participant Monitor as Monitoring App->>LLM: generate(prompt) LLM->>Cache: Get last successful provider Cache-->>LLM: Provider: Euri (cached 2min ago) LLM->>CB: Check Euri circuit CB-->>LLM: Circuit: CLOSED LLM->>RL: Request token alt Token Available RL-->>LLM: Token granted LLM->>Provider: API Request Provider-->>LLM: Response (200 OK) LLM->>CB: Record success LLM->>Monitor: Log usage (tokens, cost, latency) LLM->>Cache: Update cache (provider: Euri) LLM-->>App: Return response else Rate Limited RL-->>LLM: Rate limited LLM->>LLM: Try next provider Note over LLM: Repeat process with Deepseek end alt Provider Fails Provider-->>LLM: Error (500) LLM->>CB: Record failure CB->>CB: Increment failure count LLM->>Monitor: Log error LLM->>LLM: Try next provider end ``` ## Provider Priority and Routing ```mermaid graph TD Request[New LLM Request] subgraph "Priority Order" P1[1. Euri<br/>Priority: 1<br/>Cost: $$$<br/>Latency: ★★★★] P2[2. Deepseek<br/>Priority: 2<br/>Cost: $<br/>Latency: ★★★] P3[3. Gemini<br/>Priority: 3<br/>Cost: $$<br/>Latency: ★★★★★] P4[4. Claude<br/>Priority: 4<br/>Cost: $$$$$<br/>Latency: ★★★★] end Request --> CheckMode{Request<br/>Mode} CheckMode -->|Normal| CheckHealth1{Euri<br/>Healthy?} CheckMode -->|Force Provider| DirectRoute[Use Specified<br/>Provider] CheckHealth1 -->|Yes| P1 CheckHealth1 -->|No| CheckHealth2{Deepseek<br/>Healthy?} CheckHealth2 -->|Yes| P2 CheckHealth2 -->|No| CheckHealth3{Gemini<br/>Healthy?} CheckHealth3 -->|Yes| P3 CheckHealth3 -->|No| CheckHealth4{Claude<br/>Healthy?} CheckHealth4 -->|Yes| P4 CheckHealth4 -->|No| AllDown[All Providers<br/>Unavailable] DirectRoute --> P1 DirectRoute --> P2 DirectRoute --> P3 DirectRoute --> P4 P1 --> Result P2 --> Result P3 --> Result P4 --> Result AllDown --> Error Result([Success]) Error([Error]) style P1 fill:#10b981,color:#fff style P2 fill:#10b981,color:#fff style P3 fill:#10b981,color:#fff style P4 fill:#10b981,color:#fff style Error fill:#ef4444,color:#fff ``` --- ## Configuration Example ```yaml llm: providers: - name: euri enabled: true priority: 1 api_base: "https://api.euri.ai/v1" model: "euri-default" timeout: 30 max_retries: 2 temperature: 0.7 max_tokens: 2000 - name: deepseek enabled: true priority: 2 api_base: "https://api.deepseek.com/v1" model: "deepseek-chat" timeout: 30 max_retries: 2 - name: gemini enabled: true priority: 3 model: "gemini-pro" timeout: 30 max_retries: 2 - name: claude enabled: true priority: 4 model: "claude-3-5-sonnet-20241022" timeout: 30 max_retries: 2 # Circuit Breaker Configuration circuit_breaker: enabled: true failure_threshold: 5 # Open after 5 failures timeout: 60 # Try again after 60 seconds recovery_threshold: 3 # Close after 3 successes # Rate Limiting Configuration rate_limit: enabled: true calls_per_minute: 100 burst_limit: 10 # Cost Tracking cost_tracking: enabled: true log_usage: true budget_limit: 100.00 # USD per month ``` --- ## Key Metrics Tracked | Metric | Description | Purpose | |--------|-------------|---------| | **Total Requests** | All LLM requests made | Overall usage tracking | | **Successful Calls** | Requests that succeeded | Success rate calculation | | **Failed Calls** | Requests that failed | Error rate monitoring | | **Fallback Count** | Times fallback was used | System reliability indicator | | **Provider Success Rate** | Success rate per provider | Provider health | | **Average Latency** | Average response time | Performance monitoring | | **Total Cost** | Cumulative cost | Budget tracking | | **Cost per Provider** | Cost breakdown | Cost optimization | | **Circuit Breaker Opens** | Times circuit opened | Stability indicator | | **Rate Limit Hits** | Times rate limited | Capacity planning | --- ## Failure Scenarios and Handling ### Scenario 1: Primary Provider Down ``` Request → Euri (Circuit Open) → Skip → Deepseek (Success) → Return Fallback triggered: Yes Impact: +200ms latency Cost: Reduced (Deepseek cheaper than Euri) ``` ### Scenario 2: Rate Limit Exceeded ``` Request → Euri (Rate Limited) → Skip → Deepseek (Success) → Return Fallback triggered: Yes Impact: Minimal (fast skip) Cost: Reduced ``` ### Scenario 3: All Providers Failing ``` Request → Euri (Fail) → Deepseek (Fail) → Gemini (Fail) → Claude (Fail) → Error Fallback triggered: Yes (exhausted) Impact: Request fails Alert: Sent to monitoring ``` ### Scenario 4: Intermittent Failures ``` Request 1 → Euri (Success) Request 2 → Euri (Fail) → Circuit: 1/5 Request 3 → Euri (Fail) → Circuit: 2/5 Request 4 → Euri (Fail) → Circuit: 3/5 Request 5 → Euri (Fail) → Circuit: 4/5 Request 6 → Euri (Fail) → Circuit: 5/5 → OPEN Request 7 → Euri (Skipped) → Deepseek (Success) ...60 seconds later... Request N → Euri (Half-Open) → Test request ``` --- **Status**: ✅ Production-Grade Reliability System

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/pbulbule13/mcpwithgoogle'

If you have feedback or need assistance with the MCP directory API, please join our Discord server