migration-to-bidi.md•9.48 kB
# Migration from RDP to WebDriver BiDi
## Summary
This document describes the migration from Firefox's legacy Remote Debugging Protocol (RDP) to the modern WebDriver BiDi protocol using Selenium WebDriver.
## Background
### The Problem with RDP
The initial implementation used Firefox's native Remote Debugging Protocol (RDP):
- **Complex custom implementation** - Required custom TCP transport with length-prefixed JSON framing
- **Actor-based architecture** - Complex state management with actor IDs
- **Limited documentation** - Firefox RDP is poorly documented and Firefox-specific
- **Maintenance burden** - Custom protocol parsing, connection handling, and error recovery
- **Deprecated protocol** - Mozilla is moving away from RDP in favor of WebDriver BiDi
### Why WebDriver BiDi?
WebDriver BiDi is the modern, cross-browser automation protocol:
- **W3C Standard** - Well-documented, standardized protocol
- **Future-proof** - Actively developed and maintained by browser vendors
- **Better DevTools access** - Improved console, network, and performance monitoring
- **Simplified implementation** - Use battle-tested `selenium-webdriver` library
- **Cross-browser potential** - Same API works with Chrome, Edge, Safari (future)
## Migration Approach
### What Changed
**Before (RDP):**
```
/src/firefox/
├── rdp-client.ts ❌ Custom RDP implementation (~500 lines)
├── transport.ts ❌ Custom TCP framing (~200 lines)
├── bidi-client.ts ❌ Custom BiDi WebSocket client
├── launcher.ts ❌ Custom Firefox launcher
├── devtools.ts ❌ Wrapper combining RDP + BiDi
└── types.ts RDP-specific types
```
**After (Selenium BiDi):**
```
/src/firefox/
├── devtools.ts ✅ Selenium WebDriver wrapper (~200 lines)
└── types.ts ✅ BiDi types
```
**Reduction:** ~1000 lines of custom code → ~200 lines of wrapper code
### Implementation Strategy
Instead of writing a custom BiDi client, we use `selenium-webdriver`:
```typescript
import { Builder, Browser, WebDriver } from 'selenium-webdriver';
import firefox from 'selenium-webdriver/firefox.js';
// Enable BiDi
const firefoxOptions = new firefox.Options();
firefoxOptions.enableBidi();
// Launch Firefox
const driver = await new Builder()
.forBrowser(Browser.FIREFOX)
.setFirefoxOptions(firefoxOptions)
.build();
// Get browsing context
const contextId = await driver.getWindowHandle();
// Subscribe to console events (BiDi)
const bidi = await driver.getBidi();
await bidi.subscribe('log.entryAdded', contextId, (event) => {
console.log(event.params);
});
```
## Key Technical Decisions
### 1. Direct Proxy Pattern
The `FirefoxDevTools` class is a **thin wrapper** around Selenium's `WebDriver`:
```typescript
class FirefoxDevTools {
private driver: WebDriver | null = null;
async evaluate(script: string): Promise<unknown> {
// Direct passthrough - no "smart" modifications
return await this.driver.executeScript(script);
}
}
```
**Lesson learned:** Don't try to be clever. Selenium knows what it's doing.
### 2. The `evaluate()` Anti-Pattern
**❌ What DIDN'T work:**
```typescript
async evaluate(script: string): Promise<unknown> {
// Trying to be "helpful" by auto-adding return
let finalScript = script.trim();
if (!finalScript.startsWith('return ')) {
finalScript = `return ${finalScript}`; // ❌ BREAKS multi-line scripts!
}
return await this.driver.executeScript(finalScript);
}
```
**Problem:** When user passes:
```javascript
evaluate(`
return {
timing: {...}
};
`)
```
It becomes:
```javascript
return return { // ❌ Syntax error!
timing: {...}
};
```
**✅ What WORKS:**
```typescript
async evaluate(script: string): Promise<unknown> {
// Just pass it through. Selenium handles it.
return await this.driver.executeScript(script);
}
```
**Key insight:** Selenium already handles single-line expressions vs. multi-line scripts correctly. Don't second-guess it.
### 3. Console Listener Setup
**Critical order of operations:**
```typescript
async connect(): Promise<void> {
// 1. Launch Firefox
this.driver = await new Builder()
.forBrowser(Browser.FIREFOX)
.setFirefoxOptions(firefoxOptions)
.build();
// 2. Get browsing context ID
this.currentContextId = await this.driver.getWindowHandle();
// 3. Subscribe to console BEFORE navigation
const bidi = await this.driver.getBidi();
await bidi.subscribe('log.entryAdded', this.currentContextId, callback);
// 4. Navigate (console events will now be captured)
if (this.options.startUrl) {
await this.driver.get(this.options.startUrl);
}
}
```
**Why this order matters:**
- Console listener must be attached to the browsing context
- Context ID is obtained from `getWindowHandle()`
- Listener must be active BEFORE navigation to capture early console messages
## API Compatibility
The high-level API remains unchanged for MCP tools:
```typescript
// Same API as before
await firefox.connect();
await firefox.navigate('https://example.com');
const title = await firefox.evaluate('return document.title');
const messages = await firefox.getConsoleMessages();
await firefox.close();
```
**MCP tools work without changes** - the migration is transparent to end users.
## Dependencies
### Added
```json
{
"dependencies": {
"selenium-webdriver": "^4.36.0"
},
"devDependencies": {
"geckodriver": "^6.0.2"
}
}
```
### Removed
- Custom RDP transport code
- Custom BiDi WebSocket client
- Custom Firefox launcher
- `ws` package (no longer needed)
## Build Configuration
Added `selenium-webdriver` as external dependency in `tsup.config.ts`:
```typescript
export default defineConfig({
external: [
'selenium-webdriver' // Don't bundle, load from node_modules
],
});
```
**Why external:** Selenium uses dynamic requires for browser drivers, which don't work when bundled.
## Testing
### Test Script Evolution
**Original working script** (`test-firefox-bidi.js`):
```javascript
const driver = await new Builder()
.forBrowser(Browser.FIREFOX)
.setFirefoxOptions(firefoxOptions)
.build();
const bidi = await driver.getBidi();
await bidi.subscribe('log.entryAdded', windowHandle, callback);
```
**Final implementation** (`devtools.ts`):
```typescript
// EXACT copy of working script logic
this.driver = await new Builder()
.forBrowser(Browser.FIREFOX)
.setFirefoxOptions(firefoxOptions)
.build();
const bidi = await this.driver.getBidi();
await bidi.subscribe('log.entryAdded', this.currentContextId, callback);
```
**Key principle:** When you have working code, **don't modify it**. Wrap it.
## Debugging Journey
### Issue: `evaluate()` returning `null`
**Symptoms:**
```
✅ Navigation complete - Title: Centrum.cz
❌ evaluate('document.title') returns: null
```
**Root cause:** Auto-adding `return` broke multi-line scripts
**Investigation:**
```javascript
await firefox.evaluate('return 1 + 1'); // ✅ 2
await firefox.evaluate('return document.title'); // ✅ "Centrum.cz"
await firefox.evaluate('document.title'); // ❌ null
```
**Solution:** Remove all "smart" logic. Just proxy to `executeScript()`.
### Issue: Console events not captured
**Symptoms:**
```
✅ Console listener active
❌ Found 0 console messages
```
**Root cause:** Testing showed navigation worked but context was wrong
**Investigation:**
- Verified listener was subscribed
- Verified navigation completed
- Found that context ID needs to be obtained BEFORE subscribing
**Solution:** Get `windowHandle` before `bidi.subscribe()`, not after.
## Performance Impact
### Bundle Size
| Metric | RDP | BiDi (Selenium) | Change |
|--------|-----|-----------------|--------|
| Source code | ~1200 lines | ~200 lines | **-83%** |
| Bundle size | 525 KB | 524 KB | -0.2% |
| Dependencies | ws | selenium-webdriver | N/A |
### Runtime Performance
- **Startup:** Slightly slower (geckodriver initialization)
- **Execution:** Comparable (same Firefox backend)
- **Console capture:** Faster (native BiDi events vs RDP polling)
## Future Improvements
### Short-term
- [ ] Network monitoring via BiDi (`network.beforeRequestSent`, `network.responseCompleted`)
- [ ] Better error messages from Selenium
- [ ] Connection retry logic
### Long-term
- [ ] Multi-browser support (Chrome, Edge)
- [ ] Advanced BiDi features (request interception, authentication)
- [ ] Session persistence and reconnection
## Lessons Learned
1. **Use existing libraries** - Don't reinvent the wheel. `selenium-webdriver` is battle-tested.
2. **Direct proxy pattern** - When wrapping a library, don't add "helpful" logic. Just pass through.
3. **Copy working code** - If you have a working test script, make your implementation **identical** to it.
4. **Test incrementally** - Small, focused tests (`return 1+1`, `return document.title`) reveal issues quickly.
5. **Trust the library** - Selenium knows how to handle scripts. Don't second-guess it.
## Conclusion
The migration from custom RDP to Selenium WebDriver BiDi was successful:
- ✅ **83% less code** to maintain
- ✅ **Simpler architecture** - standard library vs custom protocol
- ✅ **Better future** - W3C standard, actively developed
- ✅ **Same API** - transparent to MCP tool users
- ✅ **Working console capture** - BiDi events in real-time
**Final takeaway:** Sometimes the best code is the code you don't write. Use proven libraries and keep it simple.