SERVER_REDUNDANCY.md•5.44 kB
# 🔄 Server Redundancy & Business Continuity
Understanding the democratic server model and planning for continuity.
## 🎯 The Democratic Server Model
### Core Concept
There's no dedicated server. ANY AI can be the server - it's democratic!
### How It Works
1. **First AI to start wins** - Binds to port 9876 and becomes server
2. **Other AIs become clients** - Connect to the existing server
3. **Server AI is also a client** - Can send/receive messages like everyone else
4. **When server exits** - Server dies, messages lost, port freed
### Example Timeline
```
9:00 AM - Gemini starts → Port free → Gemini is server + client
9:05 AM - Claude starts → Port busy → Claude is client only
9:10 AM - Nessa starts → Port busy → Nessa is client only
9:30 AM - Gemini exits → Server dies, all queued messages lost
9:35 AM - Fred starts → Port free → Fred is new server!
- Claude & Nessa still running but need to reconnect
```
**Key Point**: Notice how Gemini CLI became the server in this example! ANY AI with MCP support can win the election.
## ⚠️ Current Limitations
### What Happens When the Server Dies
1. **All queued messages are lost** - They exist only in server memory
2. **Active clients can't take over** - They're stuck in client mode
3. **No automatic failover** - Manual intervention required
4. **Port becomes available** - Next NEW AI can claim it
### Recovery Scenarios
**Scenario A: Natural Recovery**
- Server exits
- NEW AI starts fresh
- Automatically becomes new server
- Existing clients can communicate through new server
**Scenario B: Manual Recovery**
- Server exits
- Active client (like Fred) wants to be server
- Fred must exit and restart
- If first to restart, becomes server
## 📊 Business Continuity Options
### Option 1: Manual Failover (Current)
**Pros**: Simple, no code changes
**Cons**: Requires manual intervention, messages lost
**Use when**: Development/testing, low message volume
### Potential **(Future)** Approaches
### Option 2: External Dedicated Server
Run the server as a standalone process:
**Pros**: Server independent of any AI session
**Cons**: Loses democratic elegance, needs dedicated process
**Use when**: Production environment, high reliability needed
### Option 3: Add Persistence (Future)
Add SQLite to persist messages:
```python
# Instead of in-memory queues
messages_db = sqlite3.connect('messages.db')
```
**Pros**: Messages survive server restart
**Cons**: Adds complexity, performance impact
**Use when**: Message persistence critical
### Option 4: Multi-Server Federation (Future)
Multiple servers that sync:
```
Server A (port 9876) ←→ Server B (port 9877)
```
**Pros**: True redundancy, no single point of failure
**Cons**: Complex synchronization, consistency challenges
**Use when**: Enterprise requirements
## 🚀 Best Practices
### For Development
1. **Accept message loss** - Due to being in a dev environment
2. **Start Claude first** - Most stable as server
3. **Document who's server** - Avoid confusion
### For (Future)Production
1. **Run dedicated server** - Using standalone process
2. **Monitor server health** - Alert on crashes
3. **Backup important messages** - Don't rely on queue
4. **Plan restart procedures** - Who restarts when
## 🔧 Monitoring Server Status
### Check if server is running:
```bash
netstat -an | grep 9876
# or
nc -zv localhost 9876
```
### See who's the server:
- First AI to successfully bind port 9876
- Check logs for "Started message broker"
### Health check script:
```python
import socket
import json
def check_ipc_health():
try:
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.settimeout(2)
sock.connect(('localhost', 9876))
# Try listing instances
request = {"action": "list"}
sock.send(json.dumps(request).encode())
response = sock.recv(1024)
return "IPC Server: UP"
except:
return "IPC Server: DOWN"
```
## 📝 Planning Your Deployment
### Questions to Answer:
1. **Message Loss Tolerance**
- Can you afford to lose queued messages?
- How critical is message delivery?
2. **Uptime Requirements**
- 99% (86 min downtime/week) → Manual failover OK
- 99.9% (10 min/week) → Need dedicated server
- 99.99% (1 min/week) → Need redundancy
3. **Recovery Time Objective**
- Minutes → Manual restart acceptable
- Seconds → Need automatic failover
4. **Message Volume**
- Low (< 100/day) → In-memory OK
- High (> 1000/day) → Consider persistence
### Deployment Recommendations:
**Casual Use**:
- Use as-is
- Accept democratic model
- Manual recovery when needed
**Team Collaboration**:
- Dedicated server process
- Document restart procedures
- Monitor server health
**Production Service**:
- Implement persistence
- Automatic restart
- Consider federation
## 🎭 The Philosophy
The democratic server model reflects a philosophical choice:
**Simplicity > Complexity**
- One codebase for all roles
- No special infrastructure
- Easy to understand
**Equality > Hierarchy**
- No "special" server machine
- Any AI can lead
- Shared responsibility
**Pragmatism > Perfection**
- Works well for intended use
- Clear upgrade path
- Start simple, grow as needed
In Summary - Enabling AI collaboration with minimal friction was the goal. The democratic model achieves this while leaving room for growth.