Skip to main content
Glama

AgentBrowser

A real, visible-cursor browser for AI agents. Real Chromium. Real humanlike physics. Real audit trail. Agents drive it the way a human would: drag, click, type, scroll - while every action is recorded, replayable, and verified.

Works with every major LLM: Anthropic Claude, OpenAI GPT, Google Gemini, Groq, Together, Fireworks, DeepInfra, Mistral, Cohere, xAI Grok, OpenRouter, Perplexity, Ollama (local), Ollama Cloud (hosted), vLLM, LM Studio, llama.cpp - or anything OpenAI-compatible. Zero lock-in.

┌──────────────────────────────────────────────────────────────────┐
│                          Agent (your code)                       │
│   "submit the payment form"  ──┐                                 │
└────────────────────────────────┼─────────────────────────────────┘
                                 │  HTTP POST /sessions/:id/plan
                                 ▼
┌──────────────────────────────────────────────────────────────────┐
│                      AgentBrowser runtime                        │
│  Planner  ──▶  findAndClick (DOM ▶ vision-LLM)  ──▶  cursor      │
│     │              │                                  │          │
│     │              ▼                                  ▼          │
│     │     verifier (DOM diff)               Bezier trajectory    │
│     │              │                          + CDP raw events   │
│     ▼              ▼                                  │          │
│  action      action.completed event ──▶  recorder (JSONL)        │
│  memory                                              │           │
│  (skip LLM                                           ▼           │
│   on visit                                  WebSocket / SSE      │
│   #2+)                                       to operator UI      │
└──────────────────────────────────────────────────────────────────┘
                                 │
                                 ▼
                     Real visible cursor moves on real Chromium

Why this exists

Every existing browser-automation tool was built for humans first and retrofitted for agents. They speak DOM operations. They produce 8000-token HTML dumps. They re-learn each site every run. They have no audit trail. They get blocked by every cookie banner.

AgentBrowser inverts this. The cursor is real and visible. All input goes through CDP raw mouse events. The API speaks in goals, not selectors. Failed actions auto-recover. Every action gets verified. The system learns each site permanently and shares knowledge across domains.

Other tools                       │   AgentBrowser
─────────────────────────────────│──────────────────────────────────
8000 tokens of HTML              │   50 tokens of structured meaning
agent guesses #submit-btn-v2     │   { goal: "submit the form" }
no replay, no audit              │   JSONL trace, deterministic replay
re-learns every visit            │   action memory, 7x faster on visit 3
blocked by every cookie wall     │   auto-dismiss + force-removal
no captcha story                 │   2Captcha / hCaptcha / Turnstile
no fingerprint defenses          │   per-context WebGL/canvas/audio noise
single integration: library      │   library + MCP + HTTP + WS + SSE + replay
locked to one LLM vendor         │   17 providers, one env var swap (Claude/GPT/
                                 │   Gemini/Ollama/vLLM/Groq/Together/Fireworks/...)

Five-minute demo

git clone https://github.com/AshtonVaughan/agentbrowser
cd agentbrowser
npm install && npx playwright install chromium && npm run build

# Pick ANY LLM provider:
ANTHROPIC_API_KEY=sk-ant-...   npm run http   # Claude (default)
# or
OPENAI_API_KEY=sk-...          npm run http   # GPT
# or
GOOGLE_API_KEY=...             npm run http   # Gemini
# or
GROQ_API_KEY=gsk_...           npm run http   # Groq (Llama on LPUs)
# or run fully local with Ollama:
ollama serve &
AGENTBROWSER_LLM_PROVIDER=ollama OLLAMA_MODEL=llama3.2-vision npm run http
# or use Ollama Cloud:
OLLAMA_CLOUD_API_KEY=... npm run http

In another terminal:

# Create a session
curl -X POST localhost:3100/api/v1/sessions
# → { "session_id": "abc123..." }

# Plan + execute a goal end-to-end
curl -X POST localhost:3100/api/v1/sessions/abc123/plan \
  -H 'content-type: application/json' \
  -d '{"goal":"go to news.ycombinator.com and click the top story"}'
# → { success: true, steps: [...], duration_ms: 4200 }

# Watch it live
open ui/operator/index.html

The agent's cursor moves humanly across the screen. Every cursor.move, click, page change streams to the operator UI in real time. Every action is recorded to ~/.agentbrowser/traces/ for replay.


What you get

Visible humanlike cursor

  • SVG cursor sprite injected via context init script

  • Bezier-curve trajectories with jitter, ease-in-out, optional overshoot

  • All input via CDP Input.dispatchMouseEvent (not Playwright locators)

  • Click ripple animation, fading 14-point cursor trail

  • Per-trajectory deterministic seed for replay reproducibility

Hybrid action layer

Layer

What it does

cursor.click(x, y)

Direct viewport click via CDP

cursor.clickBySelector(sel)

Bbox-resolve, scroll-into-view, humanlike click. Stale-element auto-recovery via accessible-name lookup.

cursor.clickByText(text)

Text disambiguation across visually similar elements

cursor.clickByRole(role, {name})

ARIA-driven targeting

findAndClick({goal, ...})

DOM selector → text → role → vision-LLM, every step verified

executor.executeAction(name)

Action-memory fast path → fallback to find-and-click

planner.planAndExecute(goal)

LLM goal decomposition → multi-step run with retry budget

Vision pipeline

  • extractElementBoxes(page) returns rich element catalog: id / role / tag / accessible name / value / bbox / selector / disabled

  • bboxScreenshot(sessionId) returns a viewport PNG with numbered cyan boxes drawn on every interactive target + the element list

  • VisionLLM.decide(goal, screenshot, elements) sends to Claude Sonnet, parses {element_id, action, rationale}

  • cursor.clickByBox(bbox) clicks vision-derived coordinates with the visible cursor

Self-healing

  • Action verifier snapshots ElementBoxes before every action, diffs after settle, declares verified=true on URL change / added / removed / textChanged / moved elements

  • Stale-element recovery in cursor.clickBySelector falls back to getByText(originalText) on selector failure

  • Modal interrupter detects fixed/absolute high-z-index dialogs at viewport center, classifies as blocker (cookie/consent/subscribe keywords) or user-relevant (login dialogs), auto-dismisses blockers and retries

  • Selector library learns from verified outcomes only - no entry in memory unless the click actually changed the page

Site + action memory

  • SQLite WAL for concurrent reads. Per-domain selector library + page-model cache.

  • ActionMemory - SHA-1 page signature × goal hash → selector × success/fail counters. Visit #2 to a known page costs zero LLM calls.

  • Cross-domain transfer: recallByGoal(goal, excludeDomain) returns winning selectors from OTHER domains for the same logical goal. The system learned "submit payment → button#pay-btn" on stripe.com; it tries the same selector on paddle.com as a hypothesis.

  • decay(unusedSinceMs) halves stale entry counts so the library stays healthy as sites change.

Recorder + replay

  • Every action streamed as JSONL to ~/.agentbrowser/traces/<session-id>.jsonl

  • ReplayEngine reads a trace, dispatches events to a fresh session at configurable speed

  • compactTrace() collapses 60-event cursor.move trajectories into 1, merges consecutive cursor.type events, drops micro-waits

  • Plan audit captures screenshot + element list at every step boundary - compliance-grade replay primitive

Skill library

  • A "skill" is a recorded trace with named slot tokens ($email, $password)

  • SkillLibrary.parameterize(events, slots) replaces literal values with tokens (longest-first to avoid partial-match bugs)

  • Save / load / list / delete via JSON files at ~/.agentbrowser/skills/

  • Portable .skill.json packages with format magic + version + metadata (author, license, tags) - bundle a skill with your agent code or publish to a registry; users import + bind their own credentials

CAPTCHA

  • TwoCaptchaSolver for hCaptcha + reCAPTCHA v2 + Cloudflare Turnstile via 2Captcha API

  • Page-side DETECT_CAPTCHA_SCRIPT finds sitekeys for all three types

  • solveCaptchaIfPresent(page, solver) chain: detect → solve → inject token → fire change/input events → invoke data-callback

  • Pluggable via CaptchaSolver interface (drop in AntiCaptcha, CapMonster, etc.)

Anti-fingerprinting

  • applyFingerprintShield(context) per-context init script

  • Spoofs WebGL UNMASKED_VENDOR/RENDERER (5 GPU profiles), navigator.hardwareConcurrency, navigator.deviceMemory, AudioContext (1e-7 noise), Canvas toDataURL (~0.08% pixel jitter), navigator.plugins

  • Per-context deterministic seed - fingerprint stays stable within a session, differs across sessions

Multi-tab

  • engine.newTab(sessionId, url?) opens a tab in the same context (shares cookies/auth)

  • Each tab has its own HumanCursor. switchTab / closeTab / listTabs.

  • HTTP: GET/POST/DELETE /sessions/:id/tabs, POST /tabs/:tab/switch

Universal LLM provider support (zero lock-in)

  • 17 providers wired - swap any of them in by setting one env var

  • LLMProvider interface (complete() + completeWithImage()); analyzer + vision-LLM + planner all use this abstraction, never SDKs

  • Auto-detection at startup picks the right provider from env vars

Provider

Set

Notes

Anthropic Claude

ANTHROPIC_API_KEY

default if nothing else set

OpenAI

OPENAI_API_KEY

gpt-4o-mini default

Google Gemini

GOOGLE_API_KEY or GEMINI_API_KEY

gemini-2.5-flash, vision native

Groq

GROQ_API_KEY

super-fast Llama/Mixtral on LPUs

Together AI

TOGETHER_API_KEY

open-source models

Fireworks

FIREWORKS_API_KEY

open-source models

DeepInfra

DEEPINFRA_API_KEY

open-source models

Mistral

MISTRAL_API_KEY

mistral-large-latest

Cohere

COHERE_API_KEY

command-r-plus via /compatibility

xAI Grok

XAI_API_KEY

grok-2 with vision

OpenRouter

OPENROUTER_API_KEY

300+ models behind one API

Perplexity

PERPLEXITY_API_KEY

online-search models

Azure OpenAI

AZURE_OPENAI_API_KEY + AZURE_OPENAI_BASE_URL

enterprise tenant

Ollama (local)

OLLAMA_BASE_URL (default localhost:11434)

llama3.2, llama3.2-vision, qwen2.5vl, etc

Ollama Cloud

OLLAMA_CLOUD_API_KEY

hosted Ollama with turbo models

vLLM

VLLM_BASE_URL (default localhost:8000)

self-hosted production inference

LM Studio

LMSTUDIO_BASE_URL (default localhost:1234)

desktop GUI

llama.cpp server

LLAMACPP_BASE_URL (default localhost:8080)

tiny self-hosted

Anything OpenAI-compatible

presets.openaiCompatible(url)

drop in your URL

// Pick a provider explicitly (any of the 17):
import { AgentBrowserHttpServer, presets } from 'agentbrowser';

const server = new AgentBrowserHttpServer({
  llm_provider: presets.ollamaCloud(),   // or .openai() or .groq() etc
  // ...
});

Or just set AGENTBROWSER_LLM_PROVIDER=ollama and the server auto-wires. Override the model with <PROVIDER>_MODEL=<model-id>.

Chrome extension (drive YOUR Chrome with YOUR cookies)

  • Install extensions/chrome/ in dev mode (chrome://extensions → Load unpacked)

  • Click the AgentBrowser icon, paste server URL + API key, click Connect

  • Now an agent calling POST /api/v1/agents/<your-id>/cmd drives YOUR Chrome with YOUR cookies and login state

  • Manifest v3 + chrome.debugger for real CDP mouse events + chrome.scripting for vision/extract

  • See extensions/chrome/README.md for the full security model

Operator + recorder + memory + skills UIs

  • ui/operator/index.html - live screenshot + cursor trail overlay + event timeline + quick actions panel

  • ui/recorder/index.html - real-time WebSocket event stream + multi-lane timeline canvas + replay scrubber + JSONL export

  • ui/memory/index.html - paginated action memory browser per domain + decay control + JSON export

  • docs/pricing.html - 4-tier pricing page wired to /api/v1/billing/checkout for self-serve Stripe payments

  • docs/skills.html - skills marketplace landing with 8 curated skills (login-stripe, login-google, amazon-add-to-cart, github-create-issue, etc.)

  • All are single-file vanilla HTML/CSS/JS. No build step.

Transcription helpers (agents that "watch" videos)

  • findCaptionTracks(page) detects HTML5 <track> + YouTube playerCaptionsTracklistRenderer + custom player markup

  • parseVTT(text) / parseJSON3(json) convert standard caption formats to typed TranscriptSegment[]

  • transcribeFromCaptions(page) one-shot: detect → fetch → parse

  • transcriptToText(segments) concatenate for LLM consumption

Production runtime

  • HTTP REST API + WebSocket + SSE on Fastify with bearer-token auth + per-key rate limiting

  • 30+ endpoints, OpenAPI 3.1 spec at /api/v1/openapi.json

  • Stripe billing wired (Checkout + webhook + signature verification + license issuance)

  • Prometheus /metrics + readiness probe + dashboard summary endpoint

  • 4-tier license scaffold (free / pro / team / enterprise) with feature gates and quota tracking

  • Multi-stage Dockerfile, docker-compose with persistent volume + 1GB shm_size for Chromium

  • Python + TypeScript SDK clients


Architecture

┌─────────────────────────────────────────────────────────────────┐
│  Operator UI  (browser-based dashboard)                         │
│   - live screenshot stream         - cursor trail viz           │
│   - action log + reasoning         - manual takeover            │
│  Recorder UI                                                    │
│   - timeline scrubber              - replay export              │
└─────────────────────────────────────────────────────────────────┘
                          ▲
                          │  WebSocket events + SSE frames + REST
┌─────────────────────────┴───────────────────────────────────────┐
│                   AgentBrowser HTTP Control Plane               │
│  ┌──────────────────┐  ┌──────────────────┐  ┌────────────────┐ │
│  │  REST + WS + SSE │  │  Bearer auth +   │  │  License +     │ │
│  │  Fastify         │  │  per-key rate    │  │  quota system  │ │
│  └──────────────────┘  └──────────────────┘  └────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
                          ▲
                          │
┌─────────────────────────┴───────────────────────────────────────┐
│                    AgentBrowser Core Runtime                    │
│  ┌──────────────────┐  ┌──────────────────┐  ┌────────────────┐ │
│  │  Planner         │  │  findAndClick    │  │  Recorder +    │ │
│  │  (goal → steps)  │  │  hybrid action   │  │  Replay engine │ │
│  └──────────────────┘  └──────────────────┘  └────────────────┘ │
│  ┌──────────────────┐  ┌──────────────────┐  ┌────────────────┐ │
│  │  Verifier        │  │  Modal           │  │  Action memory │ │
│  │  (diff snapshots)│  │  interrupter     │  │  (skip LLM)    │ │
│  └──────────────────┘  └──────────────────┘  └────────────────┘ │
│  ┌──────────────────┐  ┌──────────────────┐  ┌────────────────┐ │
│  │  Vision pipeline │  │  HumanCursor     │  │  Site memory   │ │
│  │  bbox + annotate │  │  Bezier + CDP    │  │  WAL SQLite    │ │
│  │  + VisionLLM     │  │  trail + ripple  │  │  + selectors   │ │
│  └──────────────────┘  └──────────────────┘  └────────────────┘ │
│  ┌──────────────────┐  ┌──────────────────┐  ┌────────────────┐ │
│  │  Anti-fingerprint│  │  Captcha solver  │  │  LLM provider  │ │
│  │  (canvas/WebGL)  │  │  + auto-inject   │  │  (Anthropic /  │ │
│  │                  │  │                  │  │   OpenAI / etc)│ │
│  └──────────────────┘  └──────────────────┘  └────────────────┘ │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │  Browser Engine - Playwright + stealth + multi-tab     │    │
│  │  ↓                                                      │    │
│  │  Chromium (the real browser, with the visible cursor)  │    │
│  └─────────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────────┘

Quick start - 4 ways

As a TypeScript library

import { AgentBrowser } from 'agentbrowser';

const browser = new AgentBrowser({
  anthropic_api_key: process.env.ANTHROPIC_API_KEY,
  headless: false,    // watch the cursor move
  stealth: true,
});
await browser.launch();

const state = await browser.navigate('https://news.ycombinator.com');
console.log(state.page_type);          // 'listing'
console.log(state.available_actions);  // [{ name: 'navigate_to_new', ... }, ...]

await browser.action('navigate_to_new');

const data = await browser.extract({
  top_story: 'title of the top story',
  points: 'upvote count',
  author: 'submitter username',
});

await browser.close();

Via HTTP

curl -X POST http://localhost:3100/api/v1/sessions
curl -X POST http://localhost:3100/api/v1/sessions/$ID/navigate -d '{"url":"https://example.com"}'
curl -X POST http://localhost:3100/api/v1/sessions/$ID/find_and_click -d '{"goal":"submit the form"}'
curl http://localhost:3100/api/v1/sessions/$ID/screenshot/bbox

Python SDK

from agentbrowser import AgentBrowserClient

client = AgentBrowserClient("http://localhost:3100", api_key="...")
with client.create_session() as s:
    s.navigate("https://example.com")
    s.click(selector="a")
    png, elements = s.annotated_screenshot()

MCP server (Claude Code, etc.)

Add to ~/.claude.json:

{
  "mcpServers": {
    "agentbrowser": {
      "command": "node",
      "args": ["/path/to/agentbrowser/dist/server/mcp.js"],
      "env": { "ANTHROPIC_API_KEY": "sk-..." }
    }
  }
}

Repository structure

src/
├── engine/             Browser lifecycle, sessions, tabs
│   ├── browser.ts      Playwright wrapper, cursor wiring, popup handling
│   └── tabs.ts         Multi-tab manager
├── input/              Cursor + physics + fingerprint shield
│   ├── cursor.ts       HumanCursor: SVG overlay, CDP input, click/drag/type
│   ├── trajectory.ts   Bezier path generator with jitter + overshoot
│   └── fingerprint.ts  WebGL/canvas/audio anti-fingerprint patches
├── vision/             Vision pipeline
│   ├── bbox.ts         Element catalog extraction
│   ├── annotate.ts     Numbered-box screenshot annotator
│   ├── diff.ts         Snapshot diff (added/removed/moved/textChanged)
│   └── llm.ts          Claude vision integration
├── runtime/            Action execution + autonomy
│   ├── executor.ts     Action runner with action-memory hot path
│   ├── find.ts         Hybrid DOM-then-vision findAndClick
│   ├── verifier.ts     Action verification via snapshot diff
│   ├── modal-interrupter.ts  Cookie/consent/popup detection
│   ├── planner.ts      LLM goal decomposition + multi-step execution
│   ├── plan-audit.ts   Per-step screenshot + element capture
│   ├── recorder.ts     JSONL action stream
│   ├── replay.ts       Deterministic trace replay
│   ├── compactor.ts    Trace compaction (collapse cursor.move runs)
│   ├── events.ts       In-process event broker (pub/sub)
│   ├── captcha.ts      2Captcha API integration
│   └── captcha-solver.ts  Detect → solve → inject pipeline
├── memory/             Persistent storage
│   ├── store.ts        Site memory (page model cache + selector library)
│   └── action-memory.ts  Per-(page,goal) selector cache + cross-domain transfer
├── llm/                Provider abstraction
│   ├── provider.ts     LLMProvider interface
│   ├── anthropic.ts    Anthropic SDK wrapper
│   ├── openai.ts       OpenAI-compatible (works with vLLM/Ollama too)
│   └── index.ts        autoDetectProvider
├── semantic/           Page analysis
│   └── analyzer.ts     Page-to-SemanticPageModel via LLM
├── skills/             Skill library
│   ├── skills.ts       Parameterize/save/load/run
│   └── package.ts      .skill.json export/import format
├── server/             Production server surfaces
│   ├── http.ts         Fastify + REST + WS + SSE + auth + rate limit
│   ├── openapi.ts      OpenAPI 3.1 spec
│   ├── license.ts      Tier system + quota tracking
│   └── mcp.ts          MCP server for Claude Code et al.
├── bin/
│   └── http.ts         HTTP server entry point
├── util/
│   └── coords.ts       Viewport ↔ document ↔ screenshot coord conversions
├── types.ts            Shared types (SemanticPageModel, ActionDefinition, ...)
└── index.ts            Public API barrel

tests/                  Vitest suite (119 tests, 21 files)
clients/
├── python/             Python SDK (zero deps, stdlib urllib)
└── typescript/         TypeScript SDK (browser + Node compatible)
ui/
├── operator/           Live dashboard (single HTML file)
└── recorder/           Real-time trace timeline (single HTML file)
docs/                   Public docs site (single HTML file)
examples/               Working agent demos
.agentbrowser-meta/     Internal build planning + iteration log
Dockerfile              Multi-stage slim image (~700MB with Chromium)
docker-compose.yml      Local stack with persistent volume

API surface

22 HTTP endpoints, all bearer-authenticated when API keys are configured. Full OpenAPI 3.1 at /api/v1/openapi.json.

Method

Path

Purpose

GET

/health

Liveness (no auth)

POST

/api/v1/sessions

Create session

DELETE

/api/v1/sessions/:id

Destroy session

POST

/api/v1/sessions/:id/navigate

Go to URL

POST

/api/v1/sessions/:id/cursor/{move,click,drag,scroll,type,press}

Cursor primitives

POST

/api/v1/sessions/:id/find_and_click

Hybrid DOM → vision-LLM action

POST

/api/v1/sessions/:id/plan

LLM goal decomposition + multi-step execute

POST

/api/v1/sessions/:id/extract

Schema-driven LLM extraction

POST

/api/v1/sessions/:id/fill

Fill a named form

GET

/api/v1/sessions/:id/screenshot

Viewport PNG

GET

/api/v1/sessions/:id/screenshot/bbox

Annotated PNG + element list

GET

/api/v1/sessions/:id/screenshot/stream

SSE PNG frames

WS

/api/v1/sessions/:id/screenshot/ws

Binary PNG frames

GET

/api/v1/sessions/:id/state

Current SemanticPageModel

GET

/api/v1/sessions/:id/elements

ElementBox[]

GET/POST/DELETE

/api/v1/sessions/:id/tabs

Multi-tab control

POST

/api/v1/sessions/:id/solve_captcha

Detect + solve + inject

WS

/api/v1/sessions/:id/events

Live event stream

POST

/api/v1/sessions/:id/back

History back

POST

/api/v1/sessions/:id/forward

History forward

POST

/api/v1/sessions/:id/reload

Reload current page

POST

/api/v1/sessions/:id/dialog

Accept/dismiss next native alert/confirm/prompt

GET/POST/DELETE

/api/v1/sessions/:id/cookies

Read/write/clear browser cookies

GET/POST/DELETE

/api/v1/sessions/:id/storage

Read/write/clear localStorage or sessionStorage (?kind=local|session)

POST

/api/v1/sessions/:id/upload

Set files on a file input ({selector, paths})

POST

/api/v1/sessions/:id/print

Render current page to PDF (returns application/pdf)

POST

/api/v1/sessions/:id/route/block

Block all requests matching a glob pattern

POST

/api/v1/sessions/:id/route/headers

Inject headers into requests matching a pattern

POST

/api/v1/sessions/:id/route/mock

Mock response body for matching requests

DELETE

/api/v1/sessions/:id/route

Remove all route handlers (passthrough)

POST

/api/v1/sessions/:id/geolocation

Override reported coords (or null to clear)

POST

/api/v1/sessions/:id/viewport

Resize viewport mid-session

POST

/api/v1/sessions/:id/headers

Set extra HTTP headers for all requests

POST

/api/v1/sessions/:id/har/start

Start HAR network capture

GET

/api/v1/sessions/:id/har/peek

Get current entries without stopping

POST

/api/v1/sessions/:id/har/stop

Stop and return all captured entries

POST

/api/v1/sessions/:id/console/start|peek|stop

Capture console messages + uncaught errors

POST

/api/v1/sessions/:id/throttle/network

Throttle network (downloadThroughput/uploadThroughput/latencyMs/offline)

POST

/api/v1/sessions/:id/throttle/cpu

CPU slowdown multiplier (1=native, 4=4x slower)

POST

/api/v1/sessions/:id/locale

Override navigator.language + Accept-Language

POST

/api/v1/sessions/:id/timezone

Override page timezone (e.g. "Asia/Tokyo")

POST/DELETE

/api/v1/sessions/:id/permissions

Grant/clear browser permissions (clipboard, notifications, etc.)

POST

/api/v1/sessions/:id/record/start

Begin in-memory recording for skill creation

GET

/api/v1/sessions/:id/record/peek

Live event count while recording

POST

/api/v1/sessions/:id/record/stop

Stop and (optionally) save events as a skill ({name, slots, description})

POST

/api/v1/sessions/:id/har/replay

Re-execute HAR entries and compare statuses

GET

/api/v1/sessions/:id/service-workers

List active service workers

GET/POST

/api/v1/sessions/:id/snapshot

Export full session state (cookies + storage + IDB); POST /snapshot/restore to import

POST/GET

/api/v1/sessions/:id/downloads/start|stop

Auto-capture all downloads to a session-tagged dir

GET/POST

/api/v1/sessions/:id/clipboard

Read/write the page's clipboard via navigator.clipboard

POST

/api/v1/sessions/:id/wait/selector|text|network-idle|function

Smart waiters with timeouts

GET

/api/v1/sessions/:id/markdown

Extract clean RAG-friendly markdown from current page

GET

/api/v1/skills/:name/versions

List archived versions of a skill

POST

/api/v1/skills/:name/rollback

Restore a previous version ({version: N})

GET

/api/v1/sessions/:id/activity

Idle time in ms for a session

POST

/api/v1/sessions/:id/touch

Reset idle counter

GET

/api/v1/sessions/expired

List sessions past idle timeout

POST

/api/v1/snapshot/diff

Diff two snapshots ({a, b}) returns added/removed/changed

POST

/api/v1/sessions/:id/click_by_description

Vision-only click ("the blue Submit button")

POST

/api/v1/pool/warmup

Pre-create N empty contexts for sub-100ms session creation

GET

/api/v1/pool/status

Warm pool size + oldest entry age

POST

/api/v1/pool/drain

Close all warm contexts

POST

/api/v1/sessions/:id/copilot/install

Inject highlight overlay for "AI is here" hints

POST

/api/v1/sessions/:id/copilot/highlight

Highlight a bbox with optional label

POST

/api/v1/pool/auto_refill/start|stop

Background job that keeps pool topped up

POST

/api/v1/action_memory/predict

Predict next action from memory ({url, elements, goal?})

GET

/api/v1/skills/marketplace

Static skill catalog (slug, tags, quality, runs, success_rate) for GitHub Pages hosting

GET

/api/v1/vision/cache/stats

Vision LLM cache hits/misses + persistent_size if SQLite-backed

POST

/api/v1/vision/cache/clear

Empty in-memory + on-disk vision cache

GET

/api/v1/sessions/:id/har/export

Captured HAR exported as standard HAR 1.2 (Chrome DevTools-importable)

POST

/api/v1/vision/cache/prune

Evict oldest entries beyond cache_max_disk_entries + VACUUM

POST

/api/v1/traces/diff

Diff two trace event arrays (regression testing)

POST

/api/v1/sessions/:id/shortcut

Execute named keyboard shortcut (newTab/copy/find/etc)

GET

/api/v1/shortcuts

List available named shortcuts

GET

/api/v1/dump

Single-call snapshot of all server state

POST/GET/DELETE

/api/v1/plans[/:slug]

Save / list / load / delete reusable Plan blueprints

POST

/api/v1/sessions/:id/shortcut/chain

Execute multiple named shortcuts in sequence

GET

/api/v1/metrics/summary

Per-histogram p50/p95/p99/mean/count percentiles

GET/POST/DELETE

/api/v1/rate_limits

Per-domain RPS limit (token-bucket throttle on navigate)

POST

/api/v1/sessions/:id/plans/:slug/run

Load + execute a saved Plan blueprint

POST

/api/v1/skills/:name/to_plan

Convert a recorded skill into a Plan ({save?, slug?})

GET

/api/v1/action_memory/search?pattern=...

Search action memory by selector substring

POST

/api/v1/parallel/extract

Spawn N parallel sessions, extract markdown from each URL

POST

/api/v1/plans/compose

Chain N saved plans into a super-plan

POST

/api/v1/sessions/:id/form/autofill

Auto-fill form inputs by name/label/placeholder match

POST

/api/v1/skills/diff

Compare two skills' events ({a, b})

GET/POST/DELETE

/api/v1/webhooks[/:id]

Subscribe to events, POST to external URL (HMAC-signed when secret set)

POST

/api/v1/webhooks/:id/test

Fire a test delivery to verify connectivity

POST

/api/v1/batch/csv

Process a CSV: navigate per row, extract markdown, return enriched results

GET

/api/v1/webhooks/queue

Pending webhook retry count

POST/GET

/api/v1/skills/:name/cost

Record / read per-skill LLM cost (token usage × rates)

GET

/api/v1/skills/cost/leaderboard

Top-N most expensive skills by total cost

POST

/api/v1/skills/validate

Validate a skill's structure before save ({skill})

POST

/api/v1/skills/:name/auto_tag

LLM-suggest 3-5 tags from skill description + events

POST

/api/v1/traces/render

Render TraceEvent[] as a self-contained HTML timeline page

POST

/api/v1/skills/:name/auto_describe

LLM-write a one-line description from skill events

GET

/api/v1/skills/:name/suggested_selectors

Cross-skill: selectors that worked for the same goal on other domains

GET

/api/v1/action_memory/export.csv

Download action memory as CSV (?domain=&limit=)

POST

/api/v1/action_memory/query

Composite filter+sort query (domain, goal substr, selector substr, min_success_rate, min_runs, sort_by)

GET

/api/v1/sessions/:id/a11y

Accessibility audit (missing alt/label, heading skips, empty links, missing lang)

GET

/api/v1/skills/:name/bundle

Export skill as .agbpkg (skill + plan + stats + readme)

POST

/api/v1/skills/bundle/import

Import an .agbpkg bundle

GET

/api/v1/analytics/domain/:domain

Per-domain action memory analytics + top selectors

GET/POST/DELETE

/api/v1/schedules[/:id]

Recurring skill execution ({skill_name, spec: "every 5m", bindings})

POST

/api/v1/skills/recommend

Recommend skills matching a goal text ({goal, limit?, min_score?})

POST

/api/v1/plan_templates/:name/to_skill

Convert a built-in plan template into a runnable skill

POST/GET

/api/v1/sessions/:id/network/start|peek|stop

Per-session network bytes tracking

GET

/api/v1/sessions/:id/contrast

WCAG color contrast audit

GET

/api/v1/fingerprints

List browser fingerprint presets (mac-chrome, iphone-15-pro, tokyo-iphone, etc.)

POST

/api/v1/sessions/:id/fingerprint

Apply a preset ({preset_id}) - viewport + UA + locale + timezone

GET

/api/v1/sessions/:id/memory

Per-session heap/rss/external delta from session creation

POST

/api/v1/sessions/:id/memory/snapshot

Re-baseline the session memory snapshot

GET

/api/v1/health/full

Detailed system health: process, engine, scheduler, billing, auth posture

GET

/api/v1/sessions/:id/cpu

Per-session CPU delta (user/system/wall ms + cpu_percent)

POST

/api/v1/sessions/:id/cpu/snapshot

Re-baseline the session CPU snapshot

POST

/api/v1/action_memory/distill

Top selectors across distinct domains (ship as starter packs)

POST

/api/v1/action_memory/import_patterns

Seed memory with distilled patterns from another deployment

POST

/api/v1/sessions/:id/skills/auto_run

Match goal to best skill and execute (no LLM round-trip)

GET/POST/DELETE

/api/v1/skills/ab[/:key]

Register weighted A/B routes between skill versions

POST

/api/v1/sessions/:id/skills/ab/:key/run

Run an A/B-routed skill (weighted variant pick)

WS

/api/v1/skills/events/ws

Live skill outcome firehose (?skill=<name> filter)

GET/DELETE

/api/v1/skills/ab/:key/stats

Aggregated success/failure stats per variant

POST

/api/v1/skills/ab/:key/promote

Auto-promote winning variant (z-test gated)

GET/DELETE

/api/v1/skills/percentiles[?skill=]

Per-skill p50/p95/p99 latency histograms

GET/POST/DELETE

/api/v1/skills

Skill library CRUD

GET

/api/v1/skills/:name/export

Download .skill.json

POST

/api/v1/skills/import

Import .skill.json

POST

/api/v1/sessions/:id/skills/:name/run

Replay skill with bindings

GET

/api/v1/action_memory/stats

Memory stats

GET

/api/v1/action_memory/by_domain/:domain

What does the system know about :domain

GET

/api/v1/action_memory/selectors/:domain

Top selectors per domain with success/fail stats

POST

/api/v1/action_memory/recall_by_goal

Cross-domain selector hypotheses

POST

/api/v1/action_memory/similar

TF-IDF / embedding action memory search

POST

/api/v1/action_memory/similar_embedded

Force embedding-only search

POST

/api/v1/action_memory/decay

Halve stale entry counts

POST

/api/v1/page_similarity

Score similarity between two page snapshots

POST

/api/v1/skills/discover

Suggest skills relevant for the current page

POST

/api/v1/skills/compose

Run multiple skills in sequence

GET

/api/v1/skills/stats

Per-skill success/fail history (filter by ?skill= for per-domain rows)

GET

/api/v1/skills/leaderboard

Top-N performing skills by success_count

GET

/api/v1/skills/hot

High-confidence skills (success_rate ≥ 0.9, ≥ 10 runs by default)

POST

/api/v1/skills/prune

Remove skills below a success threshold (dry-run by default)

DELETE

/api/v1/skills/:name

Delete a skill by name

GET

/api/v1/plan_templates

List built-in plan templates

GET

/api/v1/diagnose

Remote health diagnostic (mirrors agb-doctor)

POST

/api/v1/traces/compact

Compact a trace file

GET

/api/v1/openapi.json

Full OpenAPI 3.1 spec

GET

/api/v1/dashboard

Aggregated stats: site memory + action memory + skills + provider

GET

/api/v1/billing/pricing

Tier list with prices + limits (powers pricing page)

POST

/api/v1/billing/checkout

Stripe Checkout Session for {tier, email}

POST

/api/v1/billing/webhook

Stripe webhook receiver (signature-verified)

GET

/metrics

Prometheus scrape (no auth)

GET

/ready

Readiness probe (503 until engine launched)

GET

/api/v1/agents

List currently-connected Chrome extension agents

GET

/api/v1/agents/:id/poll

Long-poll for next command (used by Chrome extension)

POST

/api/v1/agents/:id/cmd

Send a command to a connected Chrome extension

POST

/api/v1/agents/:id/result

Extension posts back command results

DELETE

/api/v1/agents/:id

Drop agent state


Production deployment

Docker

docker compose up -d
# AgentBrowser on http://localhost:3100
# Memory + traces persist in named volume

Auth + rate limit

AGENTBROWSER_API_KEYS=key1,key2 \
ANTHROPIC_API_KEY=sk-ant-... \
node dist/bin/http.js

Rate limit defaults to 600 req/min/key. Override with AGENTBROWSER_RATE_LIMIT_PER_MINUTE.

Observability

  • Every action emits a typed SessionEvent to the in-process broker

  • WebSocket clients subscribe at /api/v1/sessions/:id/events (with 200-event replay buffer for reconnect)

  • JSONL traces at ~/.agentbrowser/traces/ are the audit log

  • compactTrace() keeps long traces small without losing fidelity


Tested at

Capability

Status

Type check (tsc)

clean

Test suite

119 / 119 passing across 21 files

Build (npm run build)

clean

Real Chromium navigation

verified

Bot detection bypass

passes Cloudflare interstitial, OneTrust, Cookiebot, Funding Choices, Reddit GDPR, Stack Overflow signup wall

Cursor click → DOM event

verified end-to-end

Action verifier

verified on URL change + element add/remove/text change

Vision pipeline

annotated PNG verified by magic-byte + element list

HTTP API

13 integration tests against full Fastify stack

Replay determinism

trajectory generator deterministic per seed


License

MIT. Commercial use encouraged.

For acquisition or partnership inquiries: ashtonluca@gmail.com.

F
license - not found
-
quality - not tested
C
maintenance

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/AshtonVaughan/agentbrowser'

If you have feedback or need assistance with the MCP directory API, please join our Discord server