Skip to main content
Glama
skills-design.md7.83 kB
# Skills Feature Design ## Overview The Skills feature enables mcp-browser-use to **learn browser tasks once and replay them efficiently with hints**. Skills are **machine-generated** from successful learning sessions - NOT manually authored. **Core Principle:** Skills are API extraction recipes, not DOM scrapers. The agent discovers API endpoints during learning mode. ## Architecture ``` ┌─────────────────────────────────────────────────────────────┐ │ Browser Skills Engine │ ├─────────────────────────────────────────────────────────────┤ │ │ │ LEARNING MODE (learn=True): │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ Agent + │───▶│ Analyzer │───▶│ Store │ │ │ │ API Focus │ │ (LLM finds │ │ (YAML skill │ │ │ │ Instructions│ │ money req) │ │ files) │ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ │ │ EXECUTION MODE (skill_name provided): │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ Store │───▶│ Executor │───▶│ Agent + │ │ │ │ (load skill)│ │ (inject hints)│ │ Hints │ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ │ └─────────────────────────────────────────────────────────────┘ ``` ## File Structure ``` src/mcp_server_browser_use/skills/ ├── __init__.py # Public API exports ├── models.py # Skill, MoneyRequest, SessionRecording dataclasses ├── store.py # SkillStore - YAML persistence ├── executor.py # SkillExecutor - hint injection + learning mode ├── analyzer.py # SkillAnalyzer - LLM extraction of money request ├── recorder.py # SkillRecorder - CDP network event capture └── prompts.py # API discovery and analysis prompts ``` ## Usage ### Learning Mode ```python # Learn a new skill result = await run_browser_agent( task="Find new iOS developer jobs on Upwork", learn=True, # Enable API discovery mode save_skill_as="upwork-ios-jobs" # Save extracted skill ) ``` The agent executes with modified instructions: 1. Navigate to relevant pages 2. **Discover the API endpoint** that returns the data 3. Report the endpoint, parameters, and response structure If successful, the analyzer: 1. Identifies the "money request" (API call that returned the data) 2. Extracts parameters that can be templated 3. Saves as a machine-generated skill file ### Execution Mode ```python # Use learned skill result = await run_browser_agent( task="Find new Python developer jobs", skill_name="upwork-ios-jobs", skill_params='{"keywords": "Python"}' ) ``` The agent receives hints: - Navigation steps to reach the right state - Target API endpoint to call - Expected data location in response ## Key Concepts ### Money Request The **money request** is THE API call that returns the data the user asked for: ```yaml money_request: endpoint: "/api/graphql/v1" method: POST identifies_by: "operationName: searchJobs" response_path: "data.searchJobs.edges" ``` ### Learning Mode Instructions When `learn=True`, the agent receives API discovery instructions: ``` Your goal is to complete this task BY DISCOVERING AND USING THE UNDERLYING API. Instructions: 1. Navigate to the relevant page(s) 2. OBSERVE the network requests being made (XHR/Fetch calls) 3. IDENTIFY the API endpoint that returns the data you need 4. The data comes from an API response, NOT from DOM scraping What NOT to do: - Do NOT extract data by reading DOM elements - The page DOM is just for navigation, not data extraction ``` ### Skill File Structure (Machine-Generated) ```yaml name: upwork-job-search description: Search for jobs on Upwork original_task: "Find new iOS developer jobs on Upwork" version: 1 created: 2025-12-16T10:30:00 hints: navigation: - url_pattern: "upwork.com/nx/search/jobs" description: Job search results page money_request: endpoint: "/api/graphql/v1" method: POST content_type: "application/json" identifies_by: "operationName: searchJobs" response_path: "data.searchJobs.edges" parameters: - name: keywords type: string source: body fallback: strategy: explore_full max_retries: 2 ``` ## MCP Tools ### run_browser_agent (Modified) New parameters: - `learn: bool = False` - Enable learning mode - `save_skill_as: Optional[str]` - Name to save learned skill ### skill_list List all available skills (machine-generated). ### skill_get Get full details of a specific skill. ### skill_delete Delete a skill by name. ## Why This Design? ### Why Browser + Hints (not Pure API)? 1. **Avoids detection** - Still looks like normal browsing 2. **Handles auth** - Uses browser's existing session/cookies 3. **Adapts to changes** - Falls back to exploration if API changes 4. **Respects ToS** - Not reverse-engineering private APIs ### Why Machine-Generated Skills? 1. **No human expertise needed** - Agent discovers the API 2. **Accurate extraction** - LLM identifies relevant parameters 3. **Complex structure** - Skills capture API details humans wouldn't write 4. **Self-correcting** - Re-learn if API changes ### Why API Focus (not DOM)? 1. **Faster execution** - API call vs. DOM navigation 2. **Structured data** - JSON response vs. parsing HTML 3. **More reliable** - APIs change less than UI 4. **Cleaner results** - Direct data access ## Limitations ### Learning Fails If: - Task is too vague ("extract all data from website") - No API exists (server-side rendering only) - Task requires multiple unrelated API calls - Authentication prevents API access ### Learning Succeeds If: - Agent finds a single API endpoint returning the data - Parameters can be extracted from URL/body - Response contains structured JSON data ## Implementation Status ### Phase 1: MVP ✅ - Learning mode with API discovery instructions - Skill extraction via LLM analysis - YAML persistence and hint injection - Basic fallback handling ### Phase 2: Full CDP Recording ✅ (Completed 2025-12-16) - CDP network event capture via browser-use's `cdp_client` - Handlers for `Network.requestWillBeSent`, `Network.responseReceived`, `Network.loadingFailed` - Response body capture via `Network.getResponseBody` for JSON APIs - Async task tracking with concurrency limits - Header redaction for security (cookies, auth tokens) ### Phase 3: Advanced Features (Future) - Skill validation against expected response schema - Skill versioning and migration - Skill sharing/marketplace - Confidence scoring - Multi-step skill chains

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Saik0s/mcp-browser-use'

If you have feedback or need assistance with the MCP directory API, please join our Discord server