Skip to main content
Glama
2025-12-16-skills-direct-execution.md11.9 kB
# Skills Direct Execution Implementation ## Summary Implemented direct skill execution architecture where skills execute fetch() from within the browser (via CDP) instead of having the agent navigate. This should reduce execution time from ~60-120s to ~1-3s. Implementation is complete but has a navigation issue that needs fixing. ## 1. Primary Request and Intent User wanted **fast direct execution for Skills**: 1. **Learning Mode**: Agent browses, finds THE single request that: - Takes user input (parameters) - Returns the desired output (JSON or HTML) - Document everything about this call 2. **Execution Mode** (the key change): - Execute request DIRECTLY from inside browser via `fetch()` (not external HTTP) - This preserves cookies/session/auth state - ~1-3 seconds instead of ~60-120 seconds - If 401/403: navigate to auth recovery page, agent re-auths, retry 3. **Why from browser**: Execute fetch() via CDP `Runtime.evaluate` so browser's cookie jar and session state is used automatically. No CORS issues since request comes from page context. ## 2. Key Technical Concepts - **SkillRequest**: Full URL with `{param}` placeholders, method, headers, body template, response_type (json/html), extract_path - **AuthRecovery**: Trigger on status codes (401/403), recovery_page URL for re-auth - **SkillRunner**: Navigates to domain, executes `fetch()` via CDP, parses response - **CDP Runtime.evaluate**: Execute JavaScript in browser context - **Direct execution flow**: Check `skill.supports_direct_execution`, try SkillRunner, fall back to agent on failure ## 3. Files and Code Sections ### `src/mcp_server_browser_use/skills/models.py` - **Why important**: Core data models for direct execution - **Changes made**: Added `SkillRequest`, `AuthRecovery`, updated `Skill` class ```python @dataclass class SkillRequest: """Complete request specification for direct browser execution.""" url: str # Full URL with {param} placeholders method: str = "GET" headers: dict[str, str] = field(default_factory=dict) body_template: Optional[str] = None response_type: Literal["json", "html", "text"] = "json" extract_path: Optional[str] = None # JSONPath like "objects[*].package" html_selectors: Optional[dict[str, str]] = None def build_url(self, params: dict[str, Any]) -> str: url = self.url for key, value in params.items(): url = url.replace(f"{{{key}}}", str(value)) return url @dataclass class AuthRecovery: """Configuration for handling authentication failures.""" trigger_on_status: list[int] = field(default_factory=lambda: [401, 403]) trigger_on_body: Optional[str] = None recovery_page: str = "" success_indicator: Optional[str] = None max_retries: int = 1 @dataclass class Skill: # NEW fields request: Optional[SkillRequest] = None # If set, use direct execution auth_recovery: Optional[AuthRecovery] = None @property def supports_direct_execution(self) -> bool: return self.request is not None ``` ### `src/mcp_server_browser_use/skills/runner.py` (NEW FILE) - **Why important**: Core execution engine for direct fetch via CDP - **Changes made**: Created entire file (~280 lines) ```python class SkillRunner: """Executes skills directly via browser fetch().""" async def run( self, skill: Skill, params: dict[str, Any], browser_session: "BrowserSession", ) -> SkillRunResult: request = skill.request url = request.build_url(params) # Navigate to domain first (for cookie context) await self._navigate_to_domain(browser_session, base_url) # Execute fetch return await self._execute_fetch(request, params, browser_session) async def _execute_fetch(self, request, params, browser_session) -> SkillRunResult: url = request.build_url(params) options = request.to_fetch_options(params) js_code = f""" (async () => {{ try {{ const response = await fetch({json.dumps(url)}, {json.dumps(options)}); const body = await response.json(); return {{ ok: response.ok, status: response.status, body: typeof body === 'string' ? body : JSON.stringify(body), }}; }} catch (error) {{ return {{ ok: false, status: 0, error: error.message }}; }} }})() """ result = await browser_session.cdp_client.send.Runtime.evaluate( params={ "expression": js_code, "awaitPromise": True, "returnByValue": True, "timeout": int(self.timeout * 1000), } ) # ... parse result ... ``` ### `src/mcp_server_browser_use/skills/prompts.py` - **Why important**: LLM prompt that extracts skill from recording - **Changes made**: Updated to request full URL and direct execution fields ```python ANALYSIS_SYSTEM_PROMPT = """... Output a JSON object with: { "success": true/false, "request": { "url": "Full URL with {param} placeholders, e.g., https://npmjs.com/search?q={query}", "method": "GET or POST", "headers": {"Content-Type": "application/json"}, "body_template": "Request body with {param} placeholders (for POST)", "response_type": "json or html or text", "extract_path": "JSONPath to extract data, e.g., objects[*].package.name" }, "auth_recovery": { "trigger_on_status": [401, 403], "recovery_page": "URL to navigate if auth fails" }, ... } """ ``` ### `src/mcp_server_browser_use/server.py` - **Why important**: Integration point - tries direct execution first - **Changes made**: Added SkillRunner flow before agent execution ```python if skill.supports_direct_execution: await ctx.info(f"Direct execution: {skill.name}") browser_session = BrowserSession(browser_profile=profile) await browser_session.start() try: runner = SkillRunner() run_result = await runner.run(skill, params_dict, browser_session) if run_result.success: skill_store.record_usage(skill.name, success=True) return json.dumps(run_result.data, indent=2) elif run_result.auth_recovery_triggered: await ctx.info("Auth required, falling back to agent...") # Continue to agent execution else: await ctx.info(f"Direct failed: {run_result.error}, trying agent...") # Continue to agent execution finally: await browser_session.stop() ``` ### `src/mcp_server_browser_use/cli.py` - **Why important**: Simplified CLI architecture - **Changes made**: Removed `connect`, `run`, `research` commands (~200 lines removed) ## 4. Problem Solving ### Solved: - CLI simplification (removed `connect`, `run`, `research` commands) - Models for direct execution (`SkillRequest`, `AuthRecovery`) - LLM analyzer now produces `request` field with full URL - Skills are learned correctly with direct execution config - All 70 tests pass ### Current Problem: **Navigation to establish cookie context fails.** The `SkillRunner._navigate_to_domain()` method needs to navigate to the target domain before executing fetch() so cookies are in scope. Attempts: 1. Direct CDP `Page.navigate` → `Page.navigate wasn't found` (domain not enabled) 2. `browser_session.navigate_to()` → browser-use's watchdog system causes hangs The issue is browser-use's complex event/watchdog system interferes with simple navigation. ## 5. Learned Skill Example (Working) ```yaml name: npm-search description: Search for npm packages and get their names and descriptions original_task: Go to npmjs.com and search for 'express'. Get the names and descriptions of the top 3 packages. request: url: https://www.npmjs.com/search?q={search_term} method: GET headers: {} response_type: json extract_path: objects[*].package auth_recovery: trigger_on_status: [401, 403] recovery_page: https://www.npmjs.com/login parameters: - name: search_term type: string required: true source: query ``` ## 6. COMPLETED: Navigation Fix **Fixed** `SkillRunner._navigate_to_domain()` by using session-scoped CDP commands: 1. Added `_get_cdp_session()` method that: - Gets CDP session via `browser_session.get_or_create_cdp_session()` - Enables Page and Runtime domains with `session_id` 2. Updated `_navigate_to_domain()` to use `Page.navigate` with `session_id` instead of `browser_session.navigate_to()` 3. Updated `_get_current_url()` to use `Page.getFrameTree` with `session_id` 4. Updated `_execute_fetch()` to use `Runtime.evaluate` with `session_id` **Key insight**: Using `session_id` parameter on CDP commands bypasses browser-use's watchdog system and avoids hangs. **Tests added**: 22 new tests in `tests/test_skills.py` covering: - SkillRequest URL/body building - Skill.supports_direct_execution property - SkillRunner session initialization - Navigation with session_id - Fetch execution with session_id - Auth recovery triggering - Same-domain navigation skip - JSON path extraction ## 7. Test Commands ```bash # Start HTTP server (foreground) mcp-server-browser-use server -f # Start HTTP server (background daemon) mcp-server-browser-use server # List skills mcp-server-browser-use skill list # Get skill details mcp-server-browser-use skill get npm-search # Run tests uv run --python 3.11 pytest tests/ -v ``` ## 8. ADDITIONAL: stdio Deprecation (December 16, 2025) **Breaking change**: stdio transport is now deprecated and will exit with migration instructions. ### Why Browser automation tasks take 60-120+ seconds. stdio transport has timeout limitations that cause tasks to fail mid-execution. ### Changes Made 1. `server.py`: `main()` now exits with `STDIO_DEPRECATION_MESSAGE` when transport is stdio 2. `cli.py`: Default behavior (no subcommand) shows deprecation message instead of running stdio server 3. `README.md`: Updated with HTTP-first setup instructions ### Migration Path for Users Users running `uvx mcp-server-browser-use` or `uvx mcp-server-browser-use@latest` will see: ``` ╔══════════════════════════════════════════════════════════════════════════════╗ ║ STDIO TRANSPORT DEPRECATED ║ ╠══════════════════════════════════════════════════════════════════════════════╣ ║ 1. START THE HTTP SERVER: ║ ║ uvx mcp-server-browser-use server ║ ║ ║ ║ 2. UPDATE YOUR CLAUDE DESKTOP CONFIG: ║ ║ Option A - Native HTTP: ║ ║ {"type": "streamable-http", "url": "http://localhost:8000/mcp"} ║ ║ ║ ║ Option B - mcp-remote bridge: ║ ║ {"command": "npx", "args": ["mcp-remote", "http://localhost:8000/mcp"]} ║ ╚══════════════════════════════════════════════════════════════════════════════╝ ``` ### Test Results - 71/71 tests passing - All linting/type checks pass - Real browser integration verified (~2s direct execution)

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Saik0s/mcp-browser-use'

If you have feedback or need assistance with the MCP directory API, please join our Discord server