# Skills Direct Execution Implementation
## Summary
Implemented direct skill execution architecture where skills execute fetch() from within the browser (via CDP) instead of having the agent navigate. This should reduce execution time from ~60-120s to ~1-3s. Implementation is complete but has a navigation issue that needs fixing.
## 1. Primary Request and Intent
User wanted **fast direct execution for Skills**:
1. **Learning Mode**: Agent browses, finds THE single request that:
- Takes user input (parameters)
- Returns the desired output (JSON or HTML)
- Document everything about this call
2. **Execution Mode** (the key change):
- Execute request DIRECTLY from inside browser via `fetch()` (not external HTTP)
- This preserves cookies/session/auth state
- ~1-3 seconds instead of ~60-120 seconds
- If 401/403: navigate to auth recovery page, agent re-auths, retry
3. **Why from browser**: Execute fetch() via CDP `Runtime.evaluate` so browser's cookie jar and session state is used automatically. No CORS issues since request comes from page context.
## 2. Key Technical Concepts
- **SkillRequest**: Full URL with `{param}` placeholders, method, headers, body template, response_type (json/html), extract_path
- **AuthRecovery**: Trigger on status codes (401/403), recovery_page URL for re-auth
- **SkillRunner**: Navigates to domain, executes `fetch()` via CDP, parses response
- **CDP Runtime.evaluate**: Execute JavaScript in browser context
- **Direct execution flow**: Check `skill.supports_direct_execution`, try SkillRunner, fall back to agent on failure
## 3. Files and Code Sections
### `src/mcp_server_browser_use/skills/models.py`
- **Why important**: Core data models for direct execution
- **Changes made**: Added `SkillRequest`, `AuthRecovery`, updated `Skill` class
```python
@dataclass
class SkillRequest:
"""Complete request specification for direct browser execution."""
url: str # Full URL with {param} placeholders
method: str = "GET"
headers: dict[str, str] = field(default_factory=dict)
body_template: Optional[str] = None
response_type: Literal["json", "html", "text"] = "json"
extract_path: Optional[str] = None # JSONPath like "objects[*].package"
html_selectors: Optional[dict[str, str]] = None
def build_url(self, params: dict[str, Any]) -> str:
url = self.url
for key, value in params.items():
url = url.replace(f"{{{key}}}", str(value))
return url
@dataclass
class AuthRecovery:
"""Configuration for handling authentication failures."""
trigger_on_status: list[int] = field(default_factory=lambda: [401, 403])
trigger_on_body: Optional[str] = None
recovery_page: str = ""
success_indicator: Optional[str] = None
max_retries: int = 1
@dataclass
class Skill:
# NEW fields
request: Optional[SkillRequest] = None # If set, use direct execution
auth_recovery: Optional[AuthRecovery] = None
@property
def supports_direct_execution(self) -> bool:
return self.request is not None
```
### `src/mcp_server_browser_use/skills/runner.py` (NEW FILE)
- **Why important**: Core execution engine for direct fetch via CDP
- **Changes made**: Created entire file (~280 lines)
```python
class SkillRunner:
"""Executes skills directly via browser fetch()."""
async def run(
self,
skill: Skill,
params: dict[str, Any],
browser_session: "BrowserSession",
) -> SkillRunResult:
request = skill.request
url = request.build_url(params)
# Navigate to domain first (for cookie context)
await self._navigate_to_domain(browser_session, base_url)
# Execute fetch
return await self._execute_fetch(request, params, browser_session)
async def _execute_fetch(self, request, params, browser_session) -> SkillRunResult:
url = request.build_url(params)
options = request.to_fetch_options(params)
js_code = f"""
(async () => {{
try {{
const response = await fetch({json.dumps(url)}, {json.dumps(options)});
const body = await response.json();
return {{
ok: response.ok,
status: response.status,
body: typeof body === 'string' ? body : JSON.stringify(body),
}};
}} catch (error) {{
return {{ ok: false, status: 0, error: error.message }};
}}
}})()
"""
result = await browser_session.cdp_client.send.Runtime.evaluate(
params={
"expression": js_code,
"awaitPromise": True,
"returnByValue": True,
"timeout": int(self.timeout * 1000),
}
)
# ... parse result ...
```
### `src/mcp_server_browser_use/skills/prompts.py`
- **Why important**: LLM prompt that extracts skill from recording
- **Changes made**: Updated to request full URL and direct execution fields
```python
ANALYSIS_SYSTEM_PROMPT = """...
Output a JSON object with:
{
"success": true/false,
"request": {
"url": "Full URL with {param} placeholders, e.g., https://npmjs.com/search?q={query}",
"method": "GET or POST",
"headers": {"Content-Type": "application/json"},
"body_template": "Request body with {param} placeholders (for POST)",
"response_type": "json or html or text",
"extract_path": "JSONPath to extract data, e.g., objects[*].package.name"
},
"auth_recovery": {
"trigger_on_status": [401, 403],
"recovery_page": "URL to navigate if auth fails"
},
...
}
"""
```
### `src/mcp_server_browser_use/server.py`
- **Why important**: Integration point - tries direct execution first
- **Changes made**: Added SkillRunner flow before agent execution
```python
if skill.supports_direct_execution:
await ctx.info(f"Direct execution: {skill.name}")
browser_session = BrowserSession(browser_profile=profile)
await browser_session.start()
try:
runner = SkillRunner()
run_result = await runner.run(skill, params_dict, browser_session)
if run_result.success:
skill_store.record_usage(skill.name, success=True)
return json.dumps(run_result.data, indent=2)
elif run_result.auth_recovery_triggered:
await ctx.info("Auth required, falling back to agent...")
# Continue to agent execution
else:
await ctx.info(f"Direct failed: {run_result.error}, trying agent...")
# Continue to agent execution
finally:
await browser_session.stop()
```
### `src/mcp_server_browser_use/cli.py`
- **Why important**: Simplified CLI architecture
- **Changes made**: Removed `connect`, `run`, `research` commands (~200 lines removed)
## 4. Problem Solving
### Solved:
- CLI simplification (removed `connect`, `run`, `research` commands)
- Models for direct execution (`SkillRequest`, `AuthRecovery`)
- LLM analyzer now produces `request` field with full URL
- Skills are learned correctly with direct execution config
- All 70 tests pass
### Current Problem:
**Navigation to establish cookie context fails.** The `SkillRunner._navigate_to_domain()` method needs to navigate to the target domain before executing fetch() so cookies are in scope.
Attempts:
1. Direct CDP `Page.navigate` → `Page.navigate wasn't found` (domain not enabled)
2. `browser_session.navigate_to()` → browser-use's watchdog system causes hangs
The issue is browser-use's complex event/watchdog system interferes with simple navigation.
## 5. Learned Skill Example (Working)
```yaml
name: npm-search
description: Search for npm packages and get their names and descriptions
original_task: Go to npmjs.com and search for 'express'. Get the names and descriptions of the top 3 packages.
request:
url: https://www.npmjs.com/search?q={search_term}
method: GET
headers: {}
response_type: json
extract_path: objects[*].package
auth_recovery:
trigger_on_status: [401, 403]
recovery_page: https://www.npmjs.com/login
parameters:
- name: search_term
type: string
required: true
source: query
```
## 6. COMPLETED: Navigation Fix
**Fixed** `SkillRunner._navigate_to_domain()` by using session-scoped CDP commands:
1. Added `_get_cdp_session()` method that:
- Gets CDP session via `browser_session.get_or_create_cdp_session()`
- Enables Page and Runtime domains with `session_id`
2. Updated `_navigate_to_domain()` to use `Page.navigate` with `session_id` instead of `browser_session.navigate_to()`
3. Updated `_get_current_url()` to use `Page.getFrameTree` with `session_id`
4. Updated `_execute_fetch()` to use `Runtime.evaluate` with `session_id`
**Key insight**: Using `session_id` parameter on CDP commands bypasses browser-use's watchdog system and avoids hangs.
**Tests added**: 22 new tests in `tests/test_skills.py` covering:
- SkillRequest URL/body building
- Skill.supports_direct_execution property
- SkillRunner session initialization
- Navigation with session_id
- Fetch execution with session_id
- Auth recovery triggering
- Same-domain navigation skip
- JSON path extraction
## 7. Test Commands
```bash
# Start HTTP server (foreground)
mcp-server-browser-use server -f
# Start HTTP server (background daemon)
mcp-server-browser-use server
# List skills
mcp-server-browser-use skill list
# Get skill details
mcp-server-browser-use skill get npm-search
# Run tests
uv run --python 3.11 pytest tests/ -v
```
## 8. ADDITIONAL: stdio Deprecation (December 16, 2025)
**Breaking change**: stdio transport is now deprecated and will exit with migration instructions.
### Why
Browser automation tasks take 60-120+ seconds. stdio transport has timeout limitations that cause tasks to fail mid-execution.
### Changes Made
1. `server.py`: `main()` now exits with `STDIO_DEPRECATION_MESSAGE` when transport is stdio
2. `cli.py`: Default behavior (no subcommand) shows deprecation message instead of running stdio server
3. `README.md`: Updated with HTTP-first setup instructions
### Migration Path for Users
Users running `uvx mcp-server-browser-use` or `uvx mcp-server-browser-use@latest` will see:
```
╔══════════════════════════════════════════════════════════════════════════════╗
║ STDIO TRANSPORT DEPRECATED ║
╠══════════════════════════════════════════════════════════════════════════════╣
║ 1. START THE HTTP SERVER: ║
║ uvx mcp-server-browser-use server ║
║ ║
║ 2. UPDATE YOUR CLAUDE DESKTOP CONFIG: ║
║ Option A - Native HTTP: ║
║ {"type": "streamable-http", "url": "http://localhost:8000/mcp"} ║
║ ║
║ Option B - mcp-remote bridge: ║
║ {"command": "npx", "args": ["mcp-remote", "http://localhost:8000/mcp"]} ║
╚══════════════════════════════════════════════════════════════════════════════╝
```
### Test Results
- 71/71 tests passing
- All linting/type checks pass
- Real browser integration verified (~2s direct execution)