Skip to main content
Glama
00-lessons-learned.md4.93 kB
# Lessons Learned from Failed Implementation Attempt ## Overview This document captures the key issues discovered in the failed implementation attempt and provides clear guidance on what to avoid in the correct implementation. ## Critical Mistakes to Avoid ### 1. ❌ **File-First Discovery** **What was done wrong:** - Used files as the primary source of instance data - Hub reads directory to discover instances - Network connections were secondary **What should be done:** - Network connections are the ONLY source of truth - Files are merely signals to initiate connections - Once connected, forget about the file ### 2. ❌ **Mutex-Based State Management** **What was done wrong:** ```go type Registry struct { mu sync.RWMutex // WRONG! instance *Instance // ... } ``` **What should be done:** ```go type Registry struct { registerChan chan *Instance // RIGHT! unregisterChan chan struct{} // All operations through channels } ``` ### 3. ❌ **Registration Before Listening** **What was done wrong:** - Started registration in a goroutine - No guarantee server was actually listening - Race condition between file creation and port readiness **What should be done:** 1. Call `net.Listen()` first 2. Get actual port from listener 3. THEN create instance file 4. Never create file before server is ready ### 4. ❌ **Complex Transport Mixing** **What was done wrong:** - Tried to support both stdio and HTTP in hub mode - Complex conditional logic for transport selection - Confusion about which mode uses which transport **What should be done:** - Hub mode (`--mcp`): ALWAYS stdio, no exceptions - Instance mode (`--no-tui`): ALWAYS HTTP - Never mix transports in the same mode ### 5. ❌ **Passive Hub Discovery** **What was done wrong:** - Hub waits for files to appear - Polls directory for changes - Reactive instead of proactive **What should be done:** - Instances connect TO the hub (future enhancement) - For now: Hub watches for files but immediately connects - Connection establishment is the key event, not file appearance ### 6. ❌ **Custom Health Checks** **What was done wrong:** - Implemented custom "ensure" messages - 200ms polling loops - Complex retry logic **What should be done:** - Use MCP protocol's built-in ping/pong - Simple 5-second intervals - Let MCP handle the protocol details ### 7. ❌ **File I/O in Critical Paths** **What was done wrong:** - Heartbeat writes every 10 seconds - File reads on every instance list - Blocking file operations **What should be done:** - Write instance file ONCE at startup - Delete file ONCE at shutdown - Everything else is in-memory/network ### 8. ❌ **Over-Engineered State Machine** **What was done wrong:** - Complex state transitions (7+ states) - Difficult to reason about - Too many edge cases **What should be done:** - Simple states: Discovered → Connected → Disconnected - Health is binary: responding to pings or not - Retry logic is simple exponential backoff ## Correct Architecture Principles ### 1. **Hub is a Simple Proxy** - Receives stdio MCP commands - Forwards to HTTP instance servers - Maintains session → instance mapping - That's it! ### 2. **Instances are Independent** - Run their own HTTP MCP servers - Write a signal file after listening - Don't know or care about the hub ### 3. **Discovery is Lightweight** - Watch directory for new JSON files - Read file once to get port/metadata - Connect immediately - Never read the file again ### 4. **Connections are Everything** - Active TCP connection = instance available - No connection = instance unavailable - No complex state tracking needed ### 5. **Channels, Not Mutexes** - Single goroutine owns each piece of state - Communication via channels - No shared memory, no locks ## Implementation Order To avoid these mistakes, follow this strict order: 1. **Start with stdio hub** (Step 1) - Just stdio server - Hardcoded tool responses - No discovery yet 2. **Add discovery watching** (Step 2) - Watch for files - Parse them - Store in memory - Still no connections 3. **Add connection management** (Step 3) - Connect to discovered instances - Simple connected/disconnected states - Channel-based operations 4. **Add tool proxying** (Step 4) - Forward tools to connected instances - Handle errors gracefully 5. **Add health monitoring** (Step 5) - MCP ping/pong only - Mark unresponsive instances 6. **Test everything** (Step 6) - Verify each component - End-to-end testing ## Red Flags to Watch For If you find yourself: - Writing mutexes → STOP, use channels - Reading files repeatedly → STOP, read once - Polling in tight loops → STOP, use events - Making it complicated → STOP, simplify ## The Golden Rule **"The hub knows about instances through network connections, not files. Files are just the doorbell; connections are the conversation."**

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/standardbeagle/brummer'

If you have feedback or need assistance with the MCP directory API, please join our Discord server