BROWSER_AUTOMATION_PLAN.md•9.41 kB
# Browser Automation Implementation Plan
## Architecture Overview
All three placeholder tools will use **Playwright** for headless browser automation:
1. **WebPageTest** - Form submission + result scraping
2. **Lighthouse** - Chrome DevTools Protocol + lighthouse npm package
3. **WebCheck** - Web scraping (or self-hosted API calls)
## Shared Browser Utilities
Create `src/shared/browser-utils.ts`:
```typescript
import { chromium, Browser, Page, BrowserContext } from 'playwright';
export interface BrowserOptions {
headless?: boolean;
timeout?: number;
userAgent?: string;
}
export class BrowserManager {
private browser: Browser | null = null;
private context: BrowserContext | null = null;
async launch(options: BrowserOptions = {}): Promise<void> {
this.browser = await chromium.launch({
headless: options.headless !== false,
args: ['--no-sandbox', '--disable-setuid-sandbox'],
});
this.context = await this.browser.newContext({
userAgent: options.userAgent,
});
}
async newPage(): Promise<Page> {
if (!this.context) throw new Error('Browser not launched');
return await this.context.newPage();
}
async close(): Promise<void> {
if (this.context) await this.context.close();
if (this.browser) await this.browser.close();
this.browser = null;
this.context = null;
}
isLaunched(): boolean {
return this.browser !== null;
}
}
// Singleton instance for reuse across tests
let globalBrowserManager: BrowserManager | null = null;
export async function getBrowserManager(): Promise<BrowserManager> {
if (!globalBrowserManager) {
globalBrowserManager = new BrowserManager();
await globalBrowserManager.launch();
}
return globalBrowserManager;
}
export async function closeBrowserManager(): Promise<void> {
if (globalBrowserManager) {
await globalBrowserManager.close();
globalBrowserManager = null;
}
}
```
## Tool-Specific Implementations
### 1. WebPageTest (Form Automation + Scraping)
**Strategy**: Navigate to webpagetest.org, fill form, submit, poll for results, scrape results page
**Implementation** (`src/performance/webpagetest.ts`):
```typescript
async function submitWebPageTest(url: string): Promise<string> {
const browser = await getBrowserManager();
const page = await browser.newPage();
try {
// Navigate to WebPageTest
await page.goto('https://www.webpagetest.org/');
// Fill form
await page.fill('input[name="url"]', url);
await page.selectOption('select[name="location"]', 'Dulles:Chrome');
// Submit
await page.click('input[type="submit"]');
// Wait for results page
await page.waitForURL(/.*\/result\/.*/);
// Extract test ID from URL
const testId = page.url().match(/\/result\/([^\/]+)/)?.[1];
return testId;
} finally {
await page.close();
}
}
async function scrapeWebPageTestResults(testId: string): Promise<WebPageTestResult> {
const browser = await getBrowserManager();
const page = await browser.newPage();
try {
await page.goto(`https://www.webpagetest.org/result/${testId}/`);
// Wait for results to load
await page.waitForSelector('.result-summary', { timeout: 300000 }); // 5 min max
// Scrape results
const grade = await page.textContent('.grade');
const loadTime = await page.textContent('.loadTime');
const fcp = await page.textContent('[data-metric="firstContentfulPaint"]');
return {
tool: 'webpagetest',
success: true,
test_id: testId,
results_url: page.url(),
grade,
summary: { loadTime, firstContentfulPaint: fcp },
};
} finally {
await page.close();
}
}
```
**Pros**: Free 300 tests/month, no API key needed
**Cons**: Slower than API (form submission + polling), fragile to UI changes
**Alternative**: Detect if user has API key, use API instead
### 2. Lighthouse (Chrome DevTools Protocol)
**Strategy**: Use official `lighthouse` npm package with Playwright's Chrome instance
**Add dependency**:
```bash
npm install lighthouse chrome-launcher
```
**Implementation** (`src/shared/lighthouse.ts`):
```typescript
import lighthouse from 'lighthouse';
import { launch } from 'chrome-launcher';
async function analyzeLighthouse(url: string, options: LighthouseOptions = {}) {
const chrome = await launch({
chromeFlags: ['--headless', '--no-sandbox'],
});
try {
const runnerResult = await lighthouse(url, {
port: chrome.port,
output: 'json',
onlyCategories: options.categories || ['performance', 'accessibility', 'seo', 'best-practices'],
formFactor: options.formFactor || 'mobile',
});
const results = runnerResult.lhr;
return {
tool: 'lighthouse',
success: true,
url,
formFactor: options.formFactor || 'mobile',
performance_score: results.categories.performance?.score * 100,
accessibility_score: results.categories.accessibility?.score * 100,
seo_score: results.categories.seo?.score * 100,
best_practices_score: results.categories['best-practices']?.score * 100,
metrics: {
firstContentfulPaint: results.audits['first-contentful-paint'].numericValue,
largestContentfulPaint: results.audits['largest-contentful-paint'].numericValue,
// ... other metrics
},
};
} finally {
await chrome.kill();
}
}
```
**Pros**: Official tool, comprehensive results, runs locally
**Cons**: Slower than PageSpeed API (no caching), more CPU intensive
**Note**: PageSpeed Insights already runs Lighthouse, so this is redundant unless you need offline testing
### 3. WebCheck (Self-Hosted or Web Scraping)
**Option A: Self-Hosted WebCheck API**
If user self-hosts WebCheck, we can call their API directly:
```typescript
async function analyzeWebCheck(url: string, options: WebCheckOptions = {}) {
const instanceUrl = options.instanceUrl || 'http://localhost:3000';
// Call self-hosted WebCheck API
const response = await fetch(`${instanceUrl}/api/analyze`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ url }),
});
return await response.json();
}
```
**Option B: Scrape Public WebCheck Instance**
```typescript
async function analyzeWebCheckScrape(url: string): Promise<WebCheckResult> {
const browser = await getBrowserManager();
const page = await browser.newPage();
try {
// Navigate to web-check.xyz
await page.goto('https://web-check.xyz/');
// Enter URL
await page.fill('input[type="url"]', url);
await page.click('button[type="submit"]');
// Wait for results
await page.waitForSelector('.results-container', { timeout: 60000 });
// Scrape results
const sitemapFound = await page.isVisible('.sitemap-result .success');
const robotsFound = await page.isVisible('.robots-result .success');
return {
tool: 'webcheck',
success: true,
url,
seo: {
sitemap_found: sitemapFound,
robots_txt_found: robotsFound,
},
};
} finally {
await page.close();
}
}
```
**Pros (Self-Hosted)**: Full control, comprehensive results
**Cons (Self-Hosted)**: Requires self-hosting
**Pros (Scraping)**: No setup needed
**Cons (Scraping)**: Fragile to UI changes, slower
## Configuration Options
Add to `package.json` or environment variables:
```json
{
"webby": {
"browser": {
"headless": true,
"timeout": 120000,
"reuseInstance": true
},
"webpagetest": {
"useAPI": false, // Set to true if user has API key
"apiKey": null
},
"lighthouse": {
"enabled": true, // Set to false to use PageSpeed Insights instead
"chromePath": null // Auto-detect if null
},
"webcheck": {
"mode": "disabled", // "self-hosted" | "scrape" | "disabled"
"instanceUrl": "http://localhost:3000"
}
}
}
```
## Recommended Implementation Order
1. **✅ Keep PageSpeed Insights** as primary (fast, reliable, free API)
2. **Implement WebPageTest** via browser automation (adds value - different testing locations)
3. **Skip Lighthouse** (redundant with PageSpeed Insights unless offline testing needed)
4. **Skip WebCheck** (minimal SEO value, better covered by Lighthouse SEO)
## Performance Considerations
**Browser Instance Management**:
- ✅ Reuse browser instance across tests (singleton pattern)
- ✅ Configure timeouts appropriately (WebPageTest can take 2-5 minutes)
- ✅ Clean up browser on MCP server shutdown
**Parallelization**:
- ⚠️ Limit concurrent browser tests (memory intensive)
- ✅ Run non-browser tests (PageSpeed, Mozilla Observatory, SSL Labs) in parallel
- ✅ Queue browser tests if multiple requested simultaneously
## Error Handling
All browser automation should handle:
- Network timeouts
- Element not found errors
- UI changes (selector updates)
- Browser crashes
- Graceful degradation (fall back to PageSpeed if Lighthouse fails)
## Testing Strategy
Create integration tests:
```typescript
// __tests__/webpagetest.test.ts
test('WebPageTest automation', async () => {
const result = await analyzeWebPageTest('https://example.com');
expect(result.success).toBe(true);
expect(result.test_id).toBeDefined();
});
```
Mock browser for unit tests:
```typescript
jest.mock('playwright', () => ({
chromium: {
launch: jest.fn(() => mockBrowser),
},
}));
```