# Lessons Learned from Failed Implementation Attempt
## Overview
This document captures the key issues discovered in the failed implementation attempt and provides clear guidance on what to avoid in the correct implementation.
## Critical Mistakes to Avoid
### 1. ❌ **File-First Discovery**
**What was done wrong:**
- Used files as the primary source of instance data
- Hub reads directory to discover instances
- Network connections were secondary
**What should be done:**
- Network connections are the ONLY source of truth
- Files are merely signals to initiate connections
- Once connected, forget about the file
### 2. ❌ **Mutex-Based State Management**
**What was done wrong:**
```go
type Registry struct {
mu sync.RWMutex // WRONG!
instance *Instance
// ...
}
```
**What should be done:**
```go
type Registry struct {
registerChan chan *Instance // RIGHT!
unregisterChan chan struct{}
// All operations through channels
}
```
### 3. ❌ **Registration Before Listening**
**What was done wrong:**
- Started registration in a goroutine
- No guarantee server was actually listening
- Race condition between file creation and port readiness
**What should be done:**
1. Call `net.Listen()` first
2. Get actual port from listener
3. THEN create instance file
4. Never create file before server is ready
### 4. ❌ **Complex Transport Mixing**
**What was done wrong:**
- Tried to support both stdio and HTTP in hub mode
- Complex conditional logic for transport selection
- Confusion about which mode uses which transport
**What should be done:**
- Hub mode (`--mcp`): ALWAYS stdio, no exceptions
- Instance mode (`--no-tui`): ALWAYS HTTP
- Never mix transports in the same mode
### 5. ❌ **Passive Hub Discovery**
**What was done wrong:**
- Hub waits for files to appear
- Polls directory for changes
- Reactive instead of proactive
**What should be done:**
- Instances connect TO the hub (future enhancement)
- For now: Hub watches for files but immediately connects
- Connection establishment is the key event, not file appearance
### 6. ❌ **Custom Health Checks**
**What was done wrong:**
- Implemented custom "ensure" messages
- 200ms polling loops
- Complex retry logic
**What should be done:**
- Use MCP protocol's built-in ping/pong
- Simple 5-second intervals
- Let MCP handle the protocol details
### 7. ❌ **File I/O in Critical Paths**
**What was done wrong:**
- Heartbeat writes every 10 seconds
- File reads on every instance list
- Blocking file operations
**What should be done:**
- Write instance file ONCE at startup
- Delete file ONCE at shutdown
- Everything else is in-memory/network
### 8. ❌ **Over-Engineered State Machine**
**What was done wrong:**
- Complex state transitions (7+ states)
- Difficult to reason about
- Too many edge cases
**What should be done:**
- Simple states: Discovered → Connected → Disconnected
- Health is binary: responding to pings or not
- Retry logic is simple exponential backoff
## Correct Architecture Principles
### 1. **Hub is a Simple Proxy**
- Receives stdio MCP commands
- Forwards to HTTP instance servers
- Maintains session → instance mapping
- That's it!
### 2. **Instances are Independent**
- Run their own HTTP MCP servers
- Write a signal file after listening
- Don't know or care about the hub
### 3. **Discovery is Lightweight**
- Watch directory for new JSON files
- Read file once to get port/metadata
- Connect immediately
- Never read the file again
### 4. **Connections are Everything**
- Active TCP connection = instance available
- No connection = instance unavailable
- No complex state tracking needed
### 5. **Channels, Not Mutexes**
- Single goroutine owns each piece of state
- Communication via channels
- No shared memory, no locks
## Implementation Order
To avoid these mistakes, follow this strict order:
1. **Start with stdio hub** (Step 1)
- Just stdio server
- Hardcoded tool responses
- No discovery yet
2. **Add discovery watching** (Step 2)
- Watch for files
- Parse them
- Store in memory
- Still no connections
3. **Add connection management** (Step 3)
- Connect to discovered instances
- Simple connected/disconnected states
- Channel-based operations
4. **Add tool proxying** (Step 4)
- Forward tools to connected instances
- Handle errors gracefully
5. **Add health monitoring** (Step 5)
- MCP ping/pong only
- Mark unresponsive instances
6. **Test everything** (Step 6)
- Verify each component
- End-to-end testing
## Red Flags to Watch For
If you find yourself:
- Writing mutexes → STOP, use channels
- Reading files repeatedly → STOP, read once
- Polling in tight loops → STOP, use events
- Making it complicated → STOP, simplify
## The Golden Rule
**"The hub knows about instances through network connections, not files. Files are just the doorbell; connections are the conversation."**