# LLM Fallback System - Detailed Flow
## LLM Fallback Flow Diagram
```mermaid
flowchart TD
Start([LLM Request]) --> CheckCache{Provider<br/>Health Cache}
CheckCache -->|Last Success<br/>< 5 min ago| UseCached[Use Last<br/>Successful Provider]
CheckCache -->|No Cache| StartFallback[Start Fallback<br/>Chain]
UseCached --> TryProvider1
StartFallback --> TryProvider1
TryProvider1[Try Provider 1:<br/>Euri] --> CheckCB1{Circuit<br/>Breaker<br/>Open?}
CheckCB1 -->|Closed| CheckRL1{Rate<br/>Limit<br/>OK?}
CheckCB1 -->|Open| Skip1[Skip Euri]
CheckRL1 -->|OK| CallEuri[Call Euri API]
CheckRL1 -->|Limited| Skip1
CallEuri --> EuriResponse{Success?}
EuriResponse -->|✓ Success| RecordSuccess1[Record Success<br/>Close Circuit<br/>Update Cache]
EuriResponse -->|✗ Timeout| RecordFailure1[Record Failure<br/>Increment Counter]
EuriResponse -->|✗ Error| RecordFailure1
RecordSuccess1 --> Return1([Return Response])
RecordFailure1 --> CheckThreshold1{Failure<br/>Threshold<br/>Reached?}
CheckThreshold1 -->|Yes| OpenCircuit1[Open Circuit<br/>Breaker]
CheckThreshold1 -->|No| NextProvider1
OpenCircuit1 --> NextProvider1[Try Provider 2:<br/>Deepseek]
Skip1 --> NextProvider1
NextProvider1 --> CheckCB2{Circuit<br/>Breaker<br/>Open?}
CheckCB2 -->|Closed| CheckRL2{Rate<br/>Limit<br/>OK?}
CheckCB2 -->|Open| Skip2[Skip Deepseek]
CheckRL2 -->|OK| CallDeepseek[Call Deepseek API]
CheckRL2 -->|Limited| Skip2
CallDeepseek --> DeepseekResponse{Success?}
DeepseekResponse -->|✓ Success| RecordSuccess2[Record Success<br/>Close Circuit<br/>Update Cache]
DeepseekResponse -->|✗ Failure| RecordFailure2[Record Failure]
RecordSuccess2 --> Return2([Return Response])
RecordFailure2 --> NextProvider2[Try Provider 3:<br/>Gemini]
Skip2 --> NextProvider2
NextProvider2 --> CheckCB3{Circuit<br/>Breaker<br/>Open?}
CheckCB3 -->|Closed| CheckRL3{Rate<br/>Limit<br/>OK?}
CheckCB3 -->|Open| Skip3[Skip Gemini]
CheckRL3 -->|OK| CallGemini[Call Gemini API]
CheckRL3 -->|Limited| Skip3
CallGemini --> GeminiResponse{Success?}
GeminiResponse -->|✓ Success| RecordSuccess3[Record Success<br/>Close Circuit<br/>Update Cache]
GeminiResponse -->|✗ Failure| RecordFailure3[Record Failure]
RecordSuccess3 --> Return3([Return Response])
RecordFailure3 --> NextProvider3[Try Provider 4:<br/>Claude]
Skip3 --> NextProvider3
NextProvider3 --> CheckCB4{Circuit<br/>Breaker<br/>Open?}
CheckCB4 -->|Closed| CheckRL4{Rate<br/>Limit<br/>OK?}
CheckCB4 -->|Open| AllFailed[All Providers<br/>Unavailable]
CheckRL4 -->|OK| CallClaude[Call Claude API]
CheckRL4 -->|Limited| AllFailed
CallClaude --> ClaudeResponse{Success?}
ClaudeResponse -->|✓ Success| RecordSuccess4[Record Success<br/>Close Circuit<br/>Update Cache]
ClaudeResponse -->|✗ Failure| RecordFailure4[Record Failure]
RecordSuccess4 --> Return4([Return Response])
RecordFailure4 --> AllFailed
AllFailed --> LogError[Log Error<br/>Send Alert]
LogError --> ThrowError([Throw Exception])
style Start fill:#667eea,color:#fff
style Return1 fill:#10b981,color:#fff
style Return2 fill:#10b981,color:#fff
style Return3 fill:#10b981,color:#fff
style Return4 fill:#10b981,color:#fff
style ThrowError fill:#ef4444,color:#fff
style CallEuri fill:#f59e0b,color:#fff
style CallDeepseek fill:#f59e0b,color:#fff
style CallGemini fill:#f59e0b,color:#fff
style CallClaude fill:#f59e0b,color:#fff
```
## Circuit Breaker State Machine
```mermaid
stateDiagram-v2
[*] --> Closed
Closed --> Open : Failure threshold reached<br/>(5 consecutive failures)
Open --> HalfOpen : Timeout elapsed<br/>(60 seconds)
HalfOpen --> Closed : Success threshold reached<br/>(3 consecutive successes)
HalfOpen --> Open : Any failure
state Closed {
[*] --> Monitoring
Monitoring --> CountingFailures : Request Failed
CountingFailures --> Monitoring : Request Succeeded
}
state Open {
[*] --> Blocking
Blocking --> WaitingForTimeout : Waiting...
WaitingForTimeout --> [*] : Timeout Reached
}
state HalfOpen {
[*] --> Testing
Testing --> CountingSuccesses : Request Succeeded
CountingSuccesses --> Testing : Next Request
}
note right of Closed
Normal Operation
All requests pass through
Failure count tracked
end note
note right of Open
Blocking Requests
Provider marked unavailable
Wait for timeout
end note
note right of HalfOpen
Testing Recovery
Limited requests allowed
Evaluating health
end note
```
## Rate Limiter Token Bucket
```mermaid
flowchart LR
subgraph "Token Bucket Algorithm"
Bucket[(Token Bucket<br/>Max: 100 tokens<br/>Current: 75 tokens)]
Refill[Refill Rate:<br/>100 tokens/min<br/>≈ 1.67 tokens/sec]
end
Request[Incoming Request] --> CheckTokens{Tokens<br/>Available?}
CheckTokens -->|Yes<br/>Tokens ≥ 1| TakeToken[Take 1 Token]
CheckTokens -->|No<br/>Tokens < 1| RateLimited[Rate Limited<br/>Return 429]
TakeToken --> ProcessRequest[Process Request]
ProcessRequest --> Success([Success])
RateLimited --> Wait{Wait for<br/>Token?}
Wait -->|Yes| WaitForRefill[Wait for Refill]
Wait -->|No| Reject([Reject Request])
WaitForRefill --> CheckTokens
Refill -.->|Add tokens<br/>continuously| Bucket
style Bucket fill:#10b981,color:#fff
style Success fill:#10b981,color:#fff
style RateLimited fill:#f59e0b,color:#fff
style Reject fill:#ef4444,color:#fff
```
## LLM Provider Health Check
```mermaid
sequenceDiagram
participant Scheduler
participant HealthCheck
participant Provider1 as Euri
participant Provider2 as Deepseek
participant Provider3 as Gemini
participant Provider4 as Claude
participant Monitor
loop Every 5 minutes
Scheduler->>HealthCheck: Run Health Check
par Check All Providers
HealthCheck->>Provider1: Send Test Request
Provider1-->>HealthCheck: Response / Timeout
HealthCheck->>Monitor: Update Euri Status
and
HealthCheck->>Provider2: Send Test Request
Provider2-->>HealthCheck: Response / Timeout
HealthCheck->>Monitor: Update Deepseek Status
and
HealthCheck->>Provider3: Send Test Request
Provider3-->>HealthCheck: Response / Timeout
HealthCheck->>Monitor: Update Gemini Status
and
HealthCheck->>Provider4: Send Test Request
Provider4-->>HealthCheck: Response / Timeout
HealthCheck->>Monitor: Update Claude Status
end
Monitor->>Monitor: Calculate Success Rates
Monitor->>Monitor: Update Provider Rankings
Monitor->>Monitor: Check for Alerts
alt All Providers Failing
Monitor->>Scheduler: Send Alert
end
end
```
## Cost Tracking Flow
```mermaid
flowchart TD
Request[LLM Request] --> SelectProvider[Select Provider]
SelectProvider --> SendRequest[Send Request to Provider]
SendRequest --> Response[Receive Response]
Response --> ExtractUsage[Extract Token Usage]
ExtractUsage --> CalcCost[Calculate Cost]
CalcCost --> UpdateMetrics[Update Metrics]
subgraph "Cost Calculation"
CalcCost --> GetRate[Get Provider Rate]
GetRate --> Multiply[Cost = Tokens × Rate]
end
subgraph "Metrics Update"
UpdateMetrics --> ProviderCost[Provider Total Cost]
UpdateMetrics --> TotalCost[System Total Cost]
UpdateMetrics --> RequestCount[Request Count]
end
UpdateMetrics --> SaveToDB[(Save to Database)]
SaveToDB --> CheckBudget{Budget<br/>Exceeded?}
CheckBudget -->|No| Continue([Continue])
CheckBudget -->|Yes| Alert[Send Budget Alert]
Alert --> Continue
style CalcCost fill:#f59e0b,color:#fff
style SaveToDB fill:#10b981,color:#fff
style Alert fill:#ef4444,color:#fff
```
## Retry Logic with Exponential Backoff
```mermaid
flowchart TD
Start([API Request]) --> Attempt1[Attempt 1]
Attempt1 --> Check1{Success?}
Check1 -->|✓| Success([Return Result])
Check1 -->|✗| Wait1[Wait 1 second]
Wait1 --> Attempt2[Attempt 2]
Attempt2 --> Check2{Success?}
Check2 -->|✓| Success
Check2 -->|✗| Wait2[Wait 2 seconds<br/>Exponential: 1×2]
Wait2 --> Attempt3[Attempt 3]
Attempt3 --> Check3{Success?}
Check3 -->|✓| Success
Check3 -->|✗| MaxRetries[Max Retries Reached]
MaxRetries --> Fallback{Fallback<br/>Available?}
Fallback -->|Yes| NextProvider[Try Next Provider]
Fallback -->|No| Failed([All Attempts Failed])
NextProvider --> Start
style Success fill:#10b981,color:#fff
style Failed fill:#ef4444,color:#fff
style Wait1 fill:#f59e0b,color:#fff
style Wait2 fill:#f59e0b,color:#fff
```
## LLM Request Lifecycle
```mermaid
sequenceDiagram
participant App as Application
participant LLM as LLM Manager
participant Cache as Provider Cache
participant CB as Circuit Breaker
participant RL as Rate Limiter
participant Provider as LLM Provider
participant Monitor as Monitoring
App->>LLM: generate(prompt)
LLM->>Cache: Get last successful provider
Cache-->>LLM: Provider: Euri (cached 2min ago)
LLM->>CB: Check Euri circuit
CB-->>LLM: Circuit: CLOSED
LLM->>RL: Request token
alt Token Available
RL-->>LLM: Token granted
LLM->>Provider: API Request
Provider-->>LLM: Response (200 OK)
LLM->>CB: Record success
LLM->>Monitor: Log usage (tokens, cost, latency)
LLM->>Cache: Update cache (provider: Euri)
LLM-->>App: Return response
else Rate Limited
RL-->>LLM: Rate limited
LLM->>LLM: Try next provider
Note over LLM: Repeat process with Deepseek
end
alt Provider Fails
Provider-->>LLM: Error (500)
LLM->>CB: Record failure
CB->>CB: Increment failure count
LLM->>Monitor: Log error
LLM->>LLM: Try next provider
end
```
## Provider Priority and Routing
```mermaid
graph TD
Request[New LLM Request]
subgraph "Priority Order"
P1[1. Euri<br/>Priority: 1<br/>Cost: $$$<br/>Latency: ★★★★]
P2[2. Deepseek<br/>Priority: 2<br/>Cost: $<br/>Latency: ★★★]
P3[3. Gemini<br/>Priority: 3<br/>Cost: $$<br/>Latency: ★★★★★]
P4[4. Claude<br/>Priority: 4<br/>Cost: $$$$$<br/>Latency: ★★★★]
end
Request --> CheckMode{Request<br/>Mode}
CheckMode -->|Normal| CheckHealth1{Euri<br/>Healthy?}
CheckMode -->|Force Provider| DirectRoute[Use Specified<br/>Provider]
CheckHealth1 -->|Yes| P1
CheckHealth1 -->|No| CheckHealth2{Deepseek<br/>Healthy?}
CheckHealth2 -->|Yes| P2
CheckHealth2 -->|No| CheckHealth3{Gemini<br/>Healthy?}
CheckHealth3 -->|Yes| P3
CheckHealth3 -->|No| CheckHealth4{Claude<br/>Healthy?}
CheckHealth4 -->|Yes| P4
CheckHealth4 -->|No| AllDown[All Providers<br/>Unavailable]
DirectRoute --> P1
DirectRoute --> P2
DirectRoute --> P3
DirectRoute --> P4
P1 --> Result
P2 --> Result
P3 --> Result
P4 --> Result
AllDown --> Error
Result([Success])
Error([Error])
style P1 fill:#10b981,color:#fff
style P2 fill:#10b981,color:#fff
style P3 fill:#10b981,color:#fff
style P4 fill:#10b981,color:#fff
style Error fill:#ef4444,color:#fff
```
---
## Configuration Example
```yaml
llm:
providers:
- name: euri
enabled: true
priority: 1
api_base: "https://api.euri.ai/v1"
model: "euri-default"
timeout: 30
max_retries: 2
temperature: 0.7
max_tokens: 2000
- name: deepseek
enabled: true
priority: 2
api_base: "https://api.deepseek.com/v1"
model: "deepseek-chat"
timeout: 30
max_retries: 2
- name: gemini
enabled: true
priority: 3
model: "gemini-pro"
timeout: 30
max_retries: 2
- name: claude
enabled: true
priority: 4
model: "claude-3-5-sonnet-20241022"
timeout: 30
max_retries: 2
# Circuit Breaker Configuration
circuit_breaker:
enabled: true
failure_threshold: 5 # Open after 5 failures
timeout: 60 # Try again after 60 seconds
recovery_threshold: 3 # Close after 3 successes
# Rate Limiting Configuration
rate_limit:
enabled: true
calls_per_minute: 100
burst_limit: 10
# Cost Tracking
cost_tracking:
enabled: true
log_usage: true
budget_limit: 100.00 # USD per month
```
---
## Key Metrics Tracked
| Metric | Description | Purpose |
|--------|-------------|---------|
| **Total Requests** | All LLM requests made | Overall usage tracking |
| **Successful Calls** | Requests that succeeded | Success rate calculation |
| **Failed Calls** | Requests that failed | Error rate monitoring |
| **Fallback Count** | Times fallback was used | System reliability indicator |
| **Provider Success Rate** | Success rate per provider | Provider health |
| **Average Latency** | Average response time | Performance monitoring |
| **Total Cost** | Cumulative cost | Budget tracking |
| **Cost per Provider** | Cost breakdown | Cost optimization |
| **Circuit Breaker Opens** | Times circuit opened | Stability indicator |
| **Rate Limit Hits** | Times rate limited | Capacity planning |
---
## Failure Scenarios and Handling
### Scenario 1: Primary Provider Down
```
Request → Euri (Circuit Open) → Skip → Deepseek (Success) → Return
Fallback triggered: Yes
Impact: +200ms latency
Cost: Reduced (Deepseek cheaper than Euri)
```
### Scenario 2: Rate Limit Exceeded
```
Request → Euri (Rate Limited) → Skip → Deepseek (Success) → Return
Fallback triggered: Yes
Impact: Minimal (fast skip)
Cost: Reduced
```
### Scenario 3: All Providers Failing
```
Request → Euri (Fail) → Deepseek (Fail) → Gemini (Fail) → Claude (Fail) → Error
Fallback triggered: Yes (exhausted)
Impact: Request fails
Alert: Sent to monitoring
```
### Scenario 4: Intermittent Failures
```
Request 1 → Euri (Success)
Request 2 → Euri (Fail) → Circuit: 1/5
Request 3 → Euri (Fail) → Circuit: 2/5
Request 4 → Euri (Fail) → Circuit: 3/5
Request 5 → Euri (Fail) → Circuit: 4/5
Request 6 → Euri (Fail) → Circuit: 5/5 → OPEN
Request 7 → Euri (Skipped) → Deepseek (Success)
...60 seconds later...
Request N → Euri (Half-Open) → Test request
```
---
**Status**: ✅ Production-Grade Reliability System