# Bidirectional Communication: Chat Client ↔ 3D App
**Date**: Discussion Document
**Topic**: Enabling bidirectional communication between MCP chat clients (ChatGPT, Claude) and the 3D visualization app
**⚠️ STATUS UPDATE**: This document describes the design and implementation approach. **Bidirectional communication has been fully implemented** as of the current codebase. See `bidirectional-communication-flow.md` for the complete implementation details.
## Table of Contents
1. [Current Architecture](#current-architecture)
2. [The Problem](#the-problem)
3. [Why Bidirectional Communication?](#why-bidirectional-communication)
4. [Technical Challenges](#technical-challenges)
5. [Approaches and Trade-offs](#approaches-and-trade-offs)
6. [MCP Protocol Constraints](#mcp-protocol-constraints)
7. [WebSocket Considerations](#websocket-considerations)
8. [Implementation Considerations](#implementation-considerations)
9. [Use Cases and Examples](#use-cases-and-examples)
---
## Current Architecture
### One-Way Command Flow
The current system implements a **unidirectional** communication pattern:
```
┌─────────────────┐
│ MCP Client │ (ChatGPT, Claude Desktop)
│ (Chat Client) │
└────────┬────────┘
│ HTTP/SSE (MCP Protocol)
│ Tool Calls: change_model_color, set_key_light_intensity, etc.
▼
┌─────────────────┐
│ MCP Server │ (server.js)
│ (Node.js) │
└────────┬────────┘
│ WebSocket (ws://localhost:3001)
│ Commands: {type: 'changeColor', color: '#ff0000'}
▼
┌─────────────────┐
│ Browser App │ (Application.js)
│ (3D Scene) │
└─────────────────┘
```
### Current Capabilities
**What works:**
- ✅ Chat client can send commands to the 3D app
- ✅ Commands are routed to specific browser sessions
- ✅ Browser app executes commands and updates the scene
- ✅ Multiple users can have isolated sessions
**What's missing:**
- ❌ Chat client cannot query current app state
- ❌ Chat client cannot verify if commands succeeded
- ❌ Chat client cannot detect what's currently displayed
- ❌ Chat client operates "blindly" - fire and forget
### Current Command Types
The system currently supports these one-way commands:
- `changeColor` - Change model color
- `changeSize` - Change model size
- `scaleCube` - Scale model independently
- `changeBackgroundColor` - Change scene background
- `setKeyLightIntensity` - Adjust key light intensity
- `setKeyLightPosition` - Move key light
- `setKeyLightColor` - Change key light color
- `setKeyLightSize` - Resize key light area
- `setFillLightIntensity` - Adjust fill light intensity
- `setFillLightPosition` - Move fill light
- `setFillLightColor` - Change fill light color
- `setFillLightSize` - Resize fill light area
- `toolCall` - Notification of tool invocation
---
## The Problem
### The "Fire and Forget" Limitation
Currently, when ChatGPT calls a tool like `change_model_color`, the flow is:
1. ChatGPT sends tool call → MCP Server
2. MCP Server sends WebSocket command → Browser
3. Browser updates scene
4. MCP Server returns success message to ChatGPT
5. **ChatGPT has no way to verify what actually happened**
### Example Problem Scenarios
**Scenario 1: State Verification**
```
User: "Change the model to red"
ChatGPT: [calls change_model_color with "#ff0000"]
ChatGPT: "I've changed the model to red"
Reality: Browser might be disconnected, color might be invalid,
or user might have manually changed it to blue
```
**Scenario 2: Conditional Logic**
```
User: "If the background is dark, make it lighter"
ChatGPT: [Cannot check current background state]
ChatGPT: "I don't know what the current background color is"
```
**Scenario 3: Incremental Changes**
```
User: "Make the key light brighter"
ChatGPT: [Cannot check current intensity]
ChatGPT: [Sets intensity to arbitrary value instead of incrementing]
```
**Scenario 4: Error Detection**
```
User: "Change the model color to red"
ChatGPT: [calls change_model_color]
Browser: [Disconnected - command never executed]
ChatGPT: "Done!" [But nothing actually happened]
```
---
## Why Bidirectional Communication?
### Benefits
1. **State Awareness**: Chat client knows what's currently displayed
2. **Error Detection**: Can verify commands actually executed
3. **Conditional Logic**: Can make decisions based on current state
4. **Incremental Changes**: Can modify existing values (e.g., "make it brighter")
5. **User Experience**: More accurate responses and better error handling
6. **Debugging**: Can diagnose issues by querying state
### Query Capabilities Needed
The chat client should be able to query:
- **Model State**: Color, scale, position, rotation
- **Scene State**: Background color, camera position
- **Lighting State**: Key/fill light intensity, color, position, size
- **Connection State**: Is browser connected? Last update time?
- **Error State**: Any recent errors or failures?
---
## Technical Challenges
### Challenge 1: MCP Protocol is Request-Response
**The Constraint:**
- MCP tools are synchronous function calls
- Tool handler must return immediately
- No built-in mechanism for async queries
**The Problem:**
- Querying browser state requires a WebSocket round-trip
- WebSocket communication is asynchronous
- Tool handler can't wait for browser response
**Possible Solutions:**
1. Use async/await with timeout
2. Cache state on server side
3. Use Promise-based state queries
### Challenge 2: WebSocket Direction
**Current State:**
- Server → Browser: ✅ Implemented
- Browser → Server: ⚠️ Partially implemented (only `registerSession`)
**What's Needed:**
- Browser must be able to send state updates
- Server must be able to request state
- Need request-response correlation (matching requests to responses)
### Challenge 3: State Synchronization
**The Problem:**
- Server doesn't know current browser state
- State can change from:
- MCP tool commands
- User interactions (mouse, keyboard)
- Browser-side code changes
- Network disconnections
**Questions:**
- Should state be cached on server?
- How to handle stale cache?
- When to refresh state?
- How to handle concurrent updates?
### Challenge 4: Session Isolation
**Current Architecture:**
- Each browser session has a unique session ID
- Commands are routed to specific sessions
- State queries must be session-specific
**Considerations:**
- State cache must be per-session
- Query requests must include session ID
- Need to handle multiple simultaneous queries
### Challenge 5: Timing and Latency
**The Problem:**
- WebSocket round-trip adds latency (50-500ms typical)
- MCP tool calls are synchronous
- User expects immediate response
**Trade-offs:**
- Fast but potentially stale (cached state)
- Accurate but slow (on-demand query)
- Hybrid approach (cache with refresh option)
---
## Approaches and Trade-offs
### Approach 1: Request-Response Pattern (On-Demand Query)
**How it works:**
1. ChatGPT calls `get_app_state` tool
2. Server sends WebSocket message: `{type: 'requestState', requestId: 'uuid'}`
3. Browser responds: `{type: 'stateResponse', requestId: 'uuid', state: {...}}`
4. Server waits for response (with timeout)
5. Tool handler returns state to ChatGPT
**Implementation Pattern:**
```javascript
// Pseudo-code
async function getAppState() {
const requestId = generateUUID();
const statePromise = waitForStateResponse(requestId, timeout);
sendToSession(sessionId, {
type: 'requestState',
requestId: requestId
});
const state = await statePromise;
return state;
}
```
**Pros:**
- ✅ Always returns current state
- ✅ No cache synchronization issues
- ✅ Simple mental model
- ✅ Accurate for real-time queries
**Cons:**
- ❌ Requires browser to be connected
- ❌ Adds latency (100-500ms)
- ❌ Requires async handling in tool handlers
- ❌ Fails if browser disconnects
- ❌ More complex error handling
**Best for:**
- When accuracy is critical
- When latency is acceptable
- When browser connection is reliable
---
### Approach 2: State Caching (Server-Side Cache)
**How it works:**
1. Browser sends state updates after each change: `{type: 'stateUpdate', state: {...}}`
2. Server caches state per session: `sessionStateCache[sessionId] = state`
3. Query tools return cached state immediately
4. No WebSocket round-trip needed
**Implementation Pattern:**
```javascript
// Browser sends state after each command
function handleCommand(command) {
executeCommand(command);
const currentState = sceneManager.getState();
wsClient.send({
type: 'stateUpdate',
state: currentState
});
}
// Server caches state
const sessionStateCache = new Map();
ws.on('message', (data) => {
if (data.type === 'stateUpdate') {
sessionStateCache.set(sessionId, data.state);
}
});
// Query tool returns cached state
function getAppState() {
return sessionStateCache.get(sessionId) || null;
}
```
**Pros:**
- ✅ Fast queries (no round-trip)
- ✅ Works even if browser disconnects briefly
- ✅ Can show "last known state"
- ✅ Simple synchronous tool handlers
- ✅ Lower WebSocket traffic
**Cons:**
- ❌ Cache can be stale
- ❌ Need to track all state changes
- ❌ More complex state management
- ❌ Cache invalidation challenges
- ❌ May miss user-initiated changes
**Best for:**
- When speed is critical
- When approximate state is acceptable
- When state changes are infrequent
- When browser connection is unreliable
---
### Approach 3: Hybrid Approach (Cache with Refresh)
**How it works:**
1. Server maintains cached state (like Approach 2)
2. Query tools can specify `forceRefresh: true`
3. If refresh requested, do on-demand query (like Approach 1)
4. Otherwise, return cached state immediately
**Implementation Pattern:**
```javascript
async function getAppState(options = {}) {
if (options.forceRefresh) {
// On-demand query
return await queryStateFromBrowser();
} else {
// Return cached state
return sessionStateCache.get(sessionId) || null;
}
}
```
**Pros:**
- ✅ Best of both worlds
- ✅ Fast by default, accurate when needed
- ✅ Flexible query options
- ✅ Can handle both use cases
**Cons:**
- ❌ More complex implementation
- ❌ Need to implement both patterns
- ❌ More code to maintain
**Best for:**
- Production systems
- When both speed and accuracy matter
- When you want flexibility
---
### Approach 4: Event-Driven State Updates
**How it works:**
1. Browser sends state updates on every change (like Approach 2)
2. Server also sends state updates to chat client via MCP notifications
3. Chat client maintains its own state cache
4. Query tools can use either cache or request fresh state
**Implementation Pattern:**
```javascript
// Browser sends state update
wsClient.send({
type: 'stateUpdate',
state: currentState
});
// Server forwards to MCP client via notifications
mcpServer.notify('app_state_changed', {
sessionId: sessionId,
state: currentState
});
```
**Pros:**
- ✅ Chat client always has latest state
- ✅ No query needed for most cases
- ✅ Real-time state synchronization
- ✅ Can trigger chat client actions
**Cons:**
- ❌ Requires MCP notification support
- ❌ More complex architecture
- ❌ Higher message volume
- ❌ May not be supported by all MCP clients
**Best for:**
- Real-time collaborative scenarios
- When state changes frequently
- When chat client needs to react to changes
---
## MCP Protocol Constraints
### Tool Call Limitations
**Synchronous Nature:**
- MCP tools are function calls that must return immediately
- Tool handler signature: `async (params) => { return result; }`
- No built-in mechanism for waiting on external events
**Response Format:**
- Tools return `{ content: [...], isError: boolean }`
- Content can be text, images, or resources
- Must be serializable JSON
**No Built-in Query Pattern:**
- MCP doesn't have a separate "query" vs "command" distinction
- All tools are treated the same way
- Query tools are just tools that don't modify state
### Workarounds
**1. Async/Await with Timeout:**
```javascript
async function getState() {
const promise = waitForResponse(requestId, 2000); // 2s timeout
sendRequest(requestId);
try {
return await promise;
} catch (timeout) {
return { error: 'Browser not responding' };
}
}
```
**2. Cached State:**
```javascript
function getState() {
return cachedState || { error: 'State not available' };
}
```
**3. Promise-based State:**
```javascript
const pendingQueries = new Map();
function getState() {
return new Promise((resolve, reject) => {
const requestId = generateUUID();
pendingQueries.set(requestId, { resolve, reject });
sendRequest(requestId);
setTimeout(() => {
pendingQueries.delete(requestId);
reject(new Error('Timeout'));
}, 2000);
});
}
```
### MCP Resources (Alternative Approach)
**What are MCP Resources?**
- MCP supports "resources" that can be read
- Resources are like files or data sources
- Can be queried independently of tools
**Could Resources Work?**
- ✅ Resources can be read on-demand
- ✅ Can represent app state as a resource
- ❌ Still need to get state from browser
- ❌ Same async challenges apply
**Example:**
```javascript
mcpServer.setResourceHandler('app://state', async (uri) => {
const state = await getStateFromBrowser();
return {
contents: [{
uri: uri,
mimeType: 'application/json',
text: JSON.stringify(state)
}]
};
});
```
---
## WebSocket Considerations
### Current WebSocket Implementation
**Server Side (server.js):**
- WebSocket server on port 3001
- Handles `registerSession` messages from browser
- Sends commands to browser via `sendToSession()`
- Stores connections in `Map<sessionId, WebSocket>`
**Browser Side (WebSocketClient.js):**
- Connects to WebSocket server
- Registers session on connection
- Receives commands via `onmessage` handler
- Can send messages via `ws.send()`
### Adding Bidirectional Communication
**What's Needed:**
1. **Request-Response Correlation:**
- Generate unique request IDs
- Match responses to requests
- Handle timeouts and errors
2. **State Query Protocol:**
- Define message types: `requestState`, `stateResponse`
- Include request IDs for correlation
- Handle error cases
3. **State Update Protocol:**
- Define message type: `stateUpdate`
- Include full or partial state
- Handle state serialization
### Message Protocol Design
**Request State:**
```json
{
"type": "requestState",
"requestId": "uuid-here",
"sessionId": "session-id"
}
```
**State Response:**
```json
{
"type": "stateResponse",
"requestId": "uuid-here",
"state": {
"model": {
"color": "#ff0000",
"scale": { "x": 1.0, "y": 1.0, "z": 1.0 }
},
"background": "#000000",
"keyLight": {
"intensity": 1.0,
"color": "#ffffff",
"position": { "x": 5, "y": 5, "z": 5 }
}
}
}
```
**State Update (Push):**
```json
{
"type": "stateUpdate",
"state": { /* same as stateResponse.state */ },
"timestamp": 1234567890
}
```
**Error Response:**
```json
{
"type": "stateError",
"requestId": "uuid-here",
"error": "Browser not connected"
}
```
---
## Implementation Considerations
### State Serialization
**What to Include:**
- Model properties (color, scale, position, rotation)
- Scene properties (background color)
- Lighting properties (key/fill light: intensity, color, position, size)
- Camera properties (position, rotation, zoom)
- Connection status
- Error state
**What to Exclude:**
- Internal Three.js objects (not serializable)
- Renderer state (not relevant)
- Temporary UI state
- Large binary data
**Serialization Strategy:**
```javascript
function serializeState(sceneManager) {
return {
model: {
color: '#' + sceneManager.model.getMaterial().color.getHexString(),
scale: {
x: sceneManager.model.getMesh().scale.x,
y: sceneManager.model.getMesh().scale.y,
z: sceneManager.model.getMesh().scale.z
}
},
// ... etc
};
}
```
### Error Handling
**Scenarios to Handle:**
1. Browser not connected
2. Browser disconnected during query
3. State query timeout
4. Invalid state response
5. Browser error during state retrieval
**Error Response Strategy:**
```javascript
// Return error in tool response
return {
content: [{
type: 'text',
text: 'Unable to retrieve app state: Browser not connected'
}],
isError: true
};
```
### Performance Considerations
**Query Frequency:**
- How often will state be queried?
- Can we batch multiple queries?
- Should we rate-limit queries?
**State Size:**
- How large is the state object?
- Should we support partial state queries?
- Can we compress state data?
**Caching Strategy:**
- How long to cache state?
- When to invalidate cache?
- How to handle stale cache?
### Testing Considerations
**Test Scenarios:**
1. Query state when browser connected
2. Query state when browser disconnected
3. Query state during state change
4. Multiple simultaneous queries
5. Query timeout handling
6. State update after command
7. State update after user interaction
---
## Use Cases and Examples
### Use Case 1: State Verification
**User Request:**
```
"Change the model to red and tell me what color it is now"
```
**Current Behavior:**
```
ChatGPT: "I've changed the model to red. I cannot verify the current color."
```
**With Bidirectional Communication:**
```
ChatGPT: [calls change_model_color]
ChatGPT: [calls get_model_state]
ChatGPT: "I've changed the model to red. The current color is #ff0000."
```
### Use Case 2: Conditional Logic
**User Request:**
```
"If the background is dark, make it lighter"
```
**Current Behavior:**
```
ChatGPT: "I don't know what the current background color is."
```
**With Bidirectional Communication:**
```
ChatGPT: [calls get_scene_state]
ChatGPT: [checks background color]
ChatGPT: [calls change_background_color with lighter color]
ChatGPT: "The background was dark, so I've lightened it."
```
### Use Case 3: Incremental Changes
**User Request:**
```
"Make the key light brighter"
```
**Current Behavior:**
```
ChatGPT: [sets intensity to arbitrary value like 2.0]
ChatGPT: "I've set the key light intensity to 2.0"
```
**With Bidirectional Communication:**
```
ChatGPT: [calls get_key_light_state]
ChatGPT: [reads current intensity: 1.0]
ChatGPT: [calls set_key_light_intensity with 1.5]
ChatGPT: "I've increased the key light intensity from 1.0 to 1.5"
```
### Use Case 4: Error Detection
**User Request:**
```
"Change the model color to red"
```
**Current Behavior:**
```
ChatGPT: [calls change_model_color]
Browser: [Disconnected - command never executed]
ChatGPT: "Done!" [But nothing happened]
```
**With Bidirectional Communication:**
```
ChatGPT: [calls change_model_color]
ChatGPT: [calls get_model_state]
ChatGPT: [detects color is still not red]
ChatGPT: "I attempted to change the color, but it appears the browser
is not connected. Please ensure the 3D app is open."
```
### Use Case 5: State Comparison
**User Request:**
```
"What's different between the current scene and a neutral setup?"
```
**With Bidirectional Communication:**
```
ChatGPT: [calls get_scene_state]
ChatGPT: [compares with default values]
ChatGPT: "The current scene differs from neutral in:
- Model color: #ff0000 (default: #808080)
- Key light intensity: 2.0 (default: 1.0)
- Background: #000000 (default: #ffffff)"
```
---
## Summary
### Key Points
1. **Current Limitation**: The system is one-way (chat → app), preventing state queries
2. **Need**: Bidirectional communication enables state awareness and better UX
3. **Challenge**: MCP protocol is synchronous, but state queries require async WebSocket communication
4. **Solutions**: Multiple approaches exist, each with trade-offs:
- On-demand queries (accurate but slow)
- State caching (fast but potentially stale)
- Hybrid approach (best of both)
5. **Implementation**: Requires WebSocket protocol extensions and state serialization
### Recommended Approach
For this application, a **Hybrid Approach (Approach 3)** is recommended:
- **Default**: Use cached state for fast queries
- **Option**: Allow `forceRefresh` for accurate queries when needed
- **Updates**: Browser sends state updates after each change
- **Fallback**: Handle disconnected browsers gracefully
This provides:
- Fast responses for most queries
- Accuracy when needed
- Graceful error handling
- Good user experience
### Next Steps
If implementing bidirectional communication:
1. **Phase 1**: Add state retrieval methods to SceneManager
2. **Phase 2**: Implement WebSocket state query protocol
3. **Phase 3**: Add state caching on server
4. **Phase 4**: Create MCP tools for state queries
5. **Phase 5**: Add error handling and edge cases
6. **Phase 6**: Test with various scenarios
---
**Document Version**: 1.0
**Last Updated**: 2024
**Related Documents**:
- `conversation-context-multi-user-implementation.md`
- `tool-routing-explanation.md`
- `high-level-command-language-notes.md`