AgentBrowser
Automatically bypasses Cloudflare challenges and bot detection mechanisms to ensure AI agents can access protected websites without manual intervention.
Enables AI agents to navigate GitHub pages and execute actions using semantic commands, abstracting away the need for raw DOM selector management.
Facilitates automated interaction with Reddit by bypassing bot detection systems and automatically dismissing GDPR gates.
Optimizes automated browsing of Stack Overflow by stripping signup modals and dismissing site-specific notice bars before analysis.
Supports semantic automation of Stripe's login and dashboard flows, utilizing site memory to learn and cache selectors for faster, more accurate interaction.
Allows agents to efficiently browse and extract data from Wikipedia by automatically dismissing cookie prompts and simplifies page models.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@AgentBrowserNavigate to news.ycombinator.com and extract the top 5 stories."
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
AgentBrowser
A real, visible-cursor browser for AI agents. Real Chromium. Real humanlike physics. Real audit trail. Agents drive it the way a human would: drag, click, type, scroll - while every action is recorded, replayable, and verified.
Works with every major LLM: Anthropic Claude, OpenAI GPT, Google Gemini, Groq, Together, Fireworks, DeepInfra, Mistral, Cohere, xAI Grok, OpenRouter, Perplexity, Ollama (local), Ollama Cloud (hosted), vLLM, LM Studio, llama.cpp - or anything OpenAI-compatible. Zero lock-in.
┌──────────────────────────────────────────────────────────────────┐
│ Agent (your code) │
│ "submit the payment form" ──┐ │
└────────────────────────────────┼─────────────────────────────────┘
│ HTTP POST /sessions/:id/plan
▼
┌──────────────────────────────────────────────────────────────────┐
│ AgentBrowser runtime │
│ Planner ──▶ findAndClick (DOM ▶ vision-LLM) ──▶ cursor │
│ │ │ │ │
│ │ ▼ ▼ │
│ │ verifier (DOM diff) Bezier trajectory │
│ │ │ + CDP raw events │
│ ▼ ▼ │ │
│ action action.completed event ──▶ recorder (JSONL) │
│ memory │ │
│ (skip LLM ▼ │
│ on visit WebSocket / SSE │
│ #2+) to operator UI │
└──────────────────────────────────────────────────────────────────┘
│
▼
Real visible cursor moves on real ChromiumWhy this exists
Every existing browser-automation tool was built for humans first and retrofitted for agents. They speak DOM operations. They produce 8000-token HTML dumps. They re-learn each site every run. They have no audit trail. They get blocked by every cookie banner.
AgentBrowser inverts this. The cursor is real and visible. All input goes through CDP raw mouse events. The API speaks in goals, not selectors. Failed actions auto-recover. Every action gets verified. The system learns each site permanently and shares knowledge across domains.
Other tools │ AgentBrowser
─────────────────────────────────│──────────────────────────────────
8000 tokens of HTML │ 50 tokens of structured meaning
agent guesses #submit-btn-v2 │ { goal: "submit the form" }
no replay, no audit │ JSONL trace, deterministic replay
re-learns every visit │ action memory, 7x faster on visit 3
blocked by every cookie wall │ auto-dismiss + force-removal
no captcha story │ 2Captcha / hCaptcha / Turnstile
no fingerprint defenses │ per-context WebGL/canvas/audio noise
single integration: library │ library + MCP + HTTP + WS + SSE + replay
locked to one LLM vendor │ 17 providers, one env var swap (Claude/GPT/
│ Gemini/Ollama/vLLM/Groq/Together/Fireworks/...)Five-minute demo
git clone https://github.com/AshtonVaughan/agentbrowser
cd agentbrowser
npm install && npx playwright install chromium && npm run build
# Pick ANY LLM provider:
ANTHROPIC_API_KEY=sk-ant-... npm run http # Claude (default)
# or
OPENAI_API_KEY=sk-... npm run http # GPT
# or
GOOGLE_API_KEY=... npm run http # Gemini
# or
GROQ_API_KEY=gsk_... npm run http # Groq (Llama on LPUs)
# or run fully local with Ollama:
ollama serve &
AGENTBROWSER_LLM_PROVIDER=ollama OLLAMA_MODEL=llama3.2-vision npm run http
# or use Ollama Cloud:
OLLAMA_CLOUD_API_KEY=... npm run httpIn another terminal:
# Create a session
curl -X POST localhost:3100/api/v1/sessions
# → { "session_id": "abc123..." }
# Plan + execute a goal end-to-end
curl -X POST localhost:3100/api/v1/sessions/abc123/plan \
-H 'content-type: application/json' \
-d '{"goal":"go to news.ycombinator.com and click the top story"}'
# → { success: true, steps: [...], duration_ms: 4200 }
# Watch it live
open ui/operator/index.htmlThe agent's cursor moves humanly across the screen. Every cursor.move, click, page change streams to the operator UI in real time. Every action is recorded to ~/.agentbrowser/traces/ for replay.
What you get
Visible humanlike cursor
SVG cursor sprite injected via context init script
Bezier-curve trajectories with jitter, ease-in-out, optional overshoot
All input via CDP
Input.dispatchMouseEvent(not Playwright locators)Click ripple animation, fading 14-point cursor trail
Per-trajectory deterministic seed for replay reproducibility
Hybrid action layer
Layer | What it does |
| Direct viewport click via CDP |
| Bbox-resolve, scroll-into-view, humanlike click. Stale-element auto-recovery via accessible-name lookup. |
| Text disambiguation across visually similar elements |
| ARIA-driven targeting |
| DOM selector → text → role → vision-LLM, every step verified |
| Action-memory fast path → fallback to find-and-click |
| LLM goal decomposition → multi-step run with retry budget |
Vision pipeline
extractElementBoxes(page)returns rich element catalog: id / role / tag / accessible name / value / bbox / selector / disabledbboxScreenshot(sessionId)returns a viewport PNG with numbered cyan boxes drawn on every interactive target + the element listVisionLLM.decide(goal, screenshot, elements)sends to Claude Sonnet, parses{element_id, action, rationale}cursor.clickByBox(bbox)clicks vision-derived coordinates with the visible cursor
Self-healing
Action verifier snapshots ElementBoxes before every action, diffs after settle, declares verified=true on URL change / added / removed / textChanged / moved elements
Stale-element recovery in
cursor.clickBySelectorfalls back togetByText(originalText)on selector failureModal interrupter detects fixed/absolute high-z-index dialogs at viewport center, classifies as blocker (cookie/consent/subscribe keywords) or user-relevant (login dialogs), auto-dismisses blockers and retries
Selector library learns from verified outcomes only - no entry in memory unless the click actually changed the page
Site + action memory
SQLite WAL for concurrent reads. Per-domain selector library + page-model cache.
ActionMemory- SHA-1 page signature × goal hash → selector × success/fail counters. Visit #2 to a known page costs zero LLM calls.Cross-domain transfer:
recallByGoal(goal, excludeDomain)returns winning selectors from OTHER domains for the same logical goal. The system learned "submit payment → button#pay-btn" on stripe.com; it tries the same selector on paddle.com as a hypothesis.decay(unusedSinceMs)halves stale entry counts so the library stays healthy as sites change.
Recorder + replay
Every action streamed as JSONL to
~/.agentbrowser/traces/<session-id>.jsonlReplayEnginereads a trace, dispatches events to a fresh session at configurable speedcompactTrace()collapses 60-event cursor.move trajectories into 1, merges consecutive cursor.type events, drops micro-waitsPlan audit captures screenshot + element list at every step boundary - compliance-grade replay primitive
Skill library
A "skill" is a recorded trace with named slot tokens (
$email,$password)SkillLibrary.parameterize(events, slots)replaces literal values with tokens (longest-first to avoid partial-match bugs)Save / load / list / delete via JSON files at
~/.agentbrowser/skills/Portable .skill.json packages with format magic + version + metadata (author, license, tags) - bundle a skill with your agent code or publish to a registry; users import + bind their own credentials
CAPTCHA
TwoCaptchaSolverfor hCaptcha + reCAPTCHA v2 + Cloudflare Turnstile via 2Captcha APIPage-side
DETECT_CAPTCHA_SCRIPTfinds sitekeys for all three typessolveCaptchaIfPresent(page, solver)chain: detect → solve → inject token → fire change/input events → invoke data-callbackPluggable via
CaptchaSolverinterface (drop in AntiCaptcha, CapMonster, etc.)
Anti-fingerprinting
applyFingerprintShield(context)per-context init scriptSpoofs WebGL UNMASKED_VENDOR/RENDERER (5 GPU profiles), navigator.hardwareConcurrency, navigator.deviceMemory, AudioContext (1e-7 noise), Canvas toDataURL (~0.08% pixel jitter), navigator.plugins
Per-context deterministic seed - fingerprint stays stable within a session, differs across sessions
Multi-tab
engine.newTab(sessionId, url?)opens a tab in the same context (shares cookies/auth)Each tab has its own
HumanCursor.switchTab/closeTab/listTabs.HTTP: GET/POST/DELETE
/sessions/:id/tabs, POST/tabs/:tab/switch
Universal LLM provider support (zero lock-in)
17 providers wired - swap any of them in by setting one env var
LLMProviderinterface (complete()+completeWithImage()); analyzer + vision-LLM + planner all use this abstraction, never SDKsAuto-detection at startup picks the right provider from env vars
Provider | Set | Notes |
Anthropic Claude |
| default if nothing else set |
OpenAI |
| gpt-4o-mini default |
Google Gemini |
| gemini-2.5-flash, vision native |
Groq |
| super-fast Llama/Mixtral on LPUs |
Together AI |
| open-source models |
Fireworks |
| open-source models |
DeepInfra |
| open-source models |
Mistral |
| mistral-large-latest |
Cohere |
| command-r-plus via /compatibility |
xAI Grok |
| grok-2 with vision |
OpenRouter |
| 300+ models behind one API |
Perplexity |
| online-search models |
Azure OpenAI |
| enterprise tenant |
Ollama (local) |
| llama3.2, llama3.2-vision, qwen2.5vl, etc |
Ollama Cloud |
| hosted Ollama with turbo models |
vLLM |
| self-hosted production inference |
LM Studio |
| desktop GUI |
llama.cpp server |
| tiny self-hosted |
Anything OpenAI-compatible |
| drop in your URL |
// Pick a provider explicitly (any of the 17):
import { AgentBrowserHttpServer, presets } from 'agentbrowser';
const server = new AgentBrowserHttpServer({
llm_provider: presets.ollamaCloud(), // or .openai() or .groq() etc
// ...
});Or just set AGENTBROWSER_LLM_PROVIDER=ollama and the server auto-wires. Override the model with <PROVIDER>_MODEL=<model-id>.
Chrome extension (drive YOUR Chrome with YOUR cookies)
Install
extensions/chrome/in dev mode (chrome://extensions→ Load unpacked)Click the AgentBrowser icon, paste server URL + API key, click Connect
Now an agent calling
POST /api/v1/agents/<your-id>/cmddrives YOUR Chrome with YOUR cookies and login stateManifest v3 +
chrome.debuggerfor real CDP mouse events +chrome.scriptingfor vision/extractSee
extensions/chrome/README.mdfor the full security model
Operator + recorder + memory + skills UIs
ui/operator/index.html- live screenshot + cursor trail overlay + event timeline + quick actions panelui/recorder/index.html- real-time WebSocket event stream + multi-lane timeline canvas + replay scrubber + JSONL exportui/memory/index.html- paginated action memory browser per domain + decay control + JSON exportdocs/pricing.html- 4-tier pricing page wired to/api/v1/billing/checkoutfor self-serve Stripe paymentsdocs/skills.html- skills marketplace landing with 8 curated skills (login-stripe, login-google, amazon-add-to-cart, github-create-issue, etc.)All are single-file vanilla HTML/CSS/JS. No build step.
Transcription helpers (agents that "watch" videos)
findCaptionTracks(page)detects HTML5<track>+ YouTubeplayerCaptionsTracklistRenderer+ custom player markupparseVTT(text)/parseJSON3(json)convert standard caption formats to typedTranscriptSegment[]transcribeFromCaptions(page)one-shot: detect → fetch → parsetranscriptToText(segments)concatenate for LLM consumption
Production runtime
HTTP REST API + WebSocket + SSE on Fastify with bearer-token auth + per-key rate limiting
30+ endpoints, OpenAPI 3.1 spec at
/api/v1/openapi.jsonStripe billing wired (Checkout + webhook + signature verification + license issuance)
Prometheus
/metrics+ readiness probe + dashboard summary endpoint4-tier license scaffold (free / pro / team / enterprise) with feature gates and quota tracking
Multi-stage Dockerfile, docker-compose with persistent volume + 1GB shm_size for Chromium
Python + TypeScript SDK clients
Architecture
┌─────────────────────────────────────────────────────────────────┐
│ Operator UI (browser-based dashboard) │
│ - live screenshot stream - cursor trail viz │
│ - action log + reasoning - manual takeover │
│ Recorder UI │
│ - timeline scrubber - replay export │
└─────────────────────────────────────────────────────────────────┘
▲
│ WebSocket events + SSE frames + REST
┌─────────────────────────┴───────────────────────────────────────┐
│ AgentBrowser HTTP Control Plane │
│ ┌──────────────────┐ ┌──────────────────┐ ┌────────────────┐ │
│ │ REST + WS + SSE │ │ Bearer auth + │ │ License + │ │
│ │ Fastify │ │ per-key rate │ │ quota system │ │
│ └──────────────────┘ └──────────────────┘ └────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
▲
│
┌─────────────────────────┴───────────────────────────────────────┐
│ AgentBrowser Core Runtime │
│ ┌──────────────────┐ ┌──────────────────┐ ┌────────────────┐ │
│ │ Planner │ │ findAndClick │ │ Recorder + │ │
│ │ (goal → steps) │ │ hybrid action │ │ Replay engine │ │
│ └──────────────────┘ └──────────────────┘ └────────────────┘ │
│ ┌──────────────────┐ ┌──────────────────┐ ┌────────────────┐ │
│ │ Verifier │ │ Modal │ │ Action memory │ │
│ │ (diff snapshots)│ │ interrupter │ │ (skip LLM) │ │
│ └──────────────────┘ └──────────────────┘ └────────────────┘ │
│ ┌──────────────────┐ ┌──────────────────┐ ┌────────────────┐ │
│ │ Vision pipeline │ │ HumanCursor │ │ Site memory │ │
│ │ bbox + annotate │ │ Bezier + CDP │ │ WAL SQLite │ │
│ │ + VisionLLM │ │ trail + ripple │ │ + selectors │ │
│ └──────────────────┘ └──────────────────┘ └────────────────┘ │
│ ┌──────────────────┐ ┌──────────────────┐ ┌────────────────┐ │
│ │ Anti-fingerprint│ │ Captcha solver │ │ LLM provider │ │
│ │ (canvas/WebGL) │ │ + auto-inject │ │ (Anthropic / │ │
│ │ │ │ │ │ OpenAI / etc)│ │
│ └──────────────────┘ └──────────────────┘ └────────────────┘ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Browser Engine - Playwright + stealth + multi-tab │ │
│ │ ↓ │ │
│ │ Chromium (the real browser, with the visible cursor) │ │
│ └─────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘Quick start - 4 ways
As a TypeScript library
import { AgentBrowser } from 'agentbrowser';
const browser = new AgentBrowser({
anthropic_api_key: process.env.ANTHROPIC_API_KEY,
headless: false, // watch the cursor move
stealth: true,
});
await browser.launch();
const state = await browser.navigate('https://news.ycombinator.com');
console.log(state.page_type); // 'listing'
console.log(state.available_actions); // [{ name: 'navigate_to_new', ... }, ...]
await browser.action('navigate_to_new');
const data = await browser.extract({
top_story: 'title of the top story',
points: 'upvote count',
author: 'submitter username',
});
await browser.close();Via HTTP
curl -X POST http://localhost:3100/api/v1/sessions
curl -X POST http://localhost:3100/api/v1/sessions/$ID/navigate -d '{"url":"https://example.com"}'
curl -X POST http://localhost:3100/api/v1/sessions/$ID/find_and_click -d '{"goal":"submit the form"}'
curl http://localhost:3100/api/v1/sessions/$ID/screenshot/bboxPython SDK
from agentbrowser import AgentBrowserClient
client = AgentBrowserClient("http://localhost:3100", api_key="...")
with client.create_session() as s:
s.navigate("https://example.com")
s.click(selector="a")
png, elements = s.annotated_screenshot()MCP server (Claude Code, etc.)
Add to ~/.claude.json:
{
"mcpServers": {
"agentbrowser": {
"command": "node",
"args": ["/path/to/agentbrowser/dist/server/mcp.js"],
"env": { "ANTHROPIC_API_KEY": "sk-..." }
}
}
}Repository structure
src/
├── engine/ Browser lifecycle, sessions, tabs
│ ├── browser.ts Playwright wrapper, cursor wiring, popup handling
│ └── tabs.ts Multi-tab manager
├── input/ Cursor + physics + fingerprint shield
│ ├── cursor.ts HumanCursor: SVG overlay, CDP input, click/drag/type
│ ├── trajectory.ts Bezier path generator with jitter + overshoot
│ └── fingerprint.ts WebGL/canvas/audio anti-fingerprint patches
├── vision/ Vision pipeline
│ ├── bbox.ts Element catalog extraction
│ ├── annotate.ts Numbered-box screenshot annotator
│ ├── diff.ts Snapshot diff (added/removed/moved/textChanged)
│ └── llm.ts Claude vision integration
├── runtime/ Action execution + autonomy
│ ├── executor.ts Action runner with action-memory hot path
│ ├── find.ts Hybrid DOM-then-vision findAndClick
│ ├── verifier.ts Action verification via snapshot diff
│ ├── modal-interrupter.ts Cookie/consent/popup detection
│ ├── planner.ts LLM goal decomposition + multi-step execution
│ ├── plan-audit.ts Per-step screenshot + element capture
│ ├── recorder.ts JSONL action stream
│ ├── replay.ts Deterministic trace replay
│ ├── compactor.ts Trace compaction (collapse cursor.move runs)
│ ├── events.ts In-process event broker (pub/sub)
│ ├── captcha.ts 2Captcha API integration
│ └── captcha-solver.ts Detect → solve → inject pipeline
├── memory/ Persistent storage
│ ├── store.ts Site memory (page model cache + selector library)
│ └── action-memory.ts Per-(page,goal) selector cache + cross-domain transfer
├── llm/ Provider abstraction
│ ├── provider.ts LLMProvider interface
│ ├── anthropic.ts Anthropic SDK wrapper
│ ├── openai.ts OpenAI-compatible (works with vLLM/Ollama too)
│ └── index.ts autoDetectProvider
├── semantic/ Page analysis
│ └── analyzer.ts Page-to-SemanticPageModel via LLM
├── skills/ Skill library
│ ├── skills.ts Parameterize/save/load/run
│ └── package.ts .skill.json export/import format
├── server/ Production server surfaces
│ ├── http.ts Fastify + REST + WS + SSE + auth + rate limit
│ ├── openapi.ts OpenAPI 3.1 spec
│ ├── license.ts Tier system + quota tracking
│ └── mcp.ts MCP server for Claude Code et al.
├── bin/
│ └── http.ts HTTP server entry point
├── util/
│ └── coords.ts Viewport ↔ document ↔ screenshot coord conversions
├── types.ts Shared types (SemanticPageModel, ActionDefinition, ...)
└── index.ts Public API barrel
tests/ Vitest suite (119 tests, 21 files)
clients/
├── python/ Python SDK (zero deps, stdlib urllib)
└── typescript/ TypeScript SDK (browser + Node compatible)
ui/
├── operator/ Live dashboard (single HTML file)
└── recorder/ Real-time trace timeline (single HTML file)
docs/ Public docs site (single HTML file)
examples/ Working agent demos
.agentbrowser-meta/ Internal build planning + iteration log
Dockerfile Multi-stage slim image (~700MB with Chromium)
docker-compose.yml Local stack with persistent volumeAPI surface
22 HTTP endpoints, all bearer-authenticated when API keys are configured. Full OpenAPI 3.1 at /api/v1/openapi.json.
Method | Path | Purpose |
GET |
| Liveness (no auth) |
POST |
| Create session |
DELETE |
| Destroy session |
POST |
| Go to URL |
POST |
| Cursor primitives |
POST |
| Hybrid DOM → vision-LLM action |
POST |
| LLM goal decomposition + multi-step execute |
POST |
| Schema-driven LLM extraction |
POST |
| Fill a named form |
GET |
| Viewport PNG |
GET |
| Annotated PNG + element list |
GET |
| SSE PNG frames |
WS |
| Binary PNG frames |
GET |
| Current SemanticPageModel |
GET |
| ElementBox[] |
GET/POST/DELETE |
| Multi-tab control |
POST |
| Detect + solve + inject |
WS |
| Live event stream |
POST |
| History back |
POST |
| History forward |
POST |
| Reload current page |
POST |
| Accept/dismiss next native alert/confirm/prompt |
GET/POST/DELETE |
| Read/write/clear browser cookies |
GET/POST/DELETE |
| Read/write/clear localStorage or sessionStorage (?kind=local|session) |
POST |
| Set files on a file input ( |
POST |
| Render current page to PDF (returns application/pdf) |
POST |
| Block all requests matching a glob pattern |
POST |
| Inject headers into requests matching a pattern |
POST |
| Mock response body for matching requests |
DELETE |
| Remove all route handlers (passthrough) |
POST |
| Override reported coords (or null to clear) |
POST |
| Resize viewport mid-session |
POST |
| Set extra HTTP headers for all requests |
POST |
| Start HAR network capture |
GET |
| Get current entries without stopping |
POST |
| Stop and return all captured entries |
POST |
| Capture console messages + uncaught errors |
POST |
| Throttle network (downloadThroughput/uploadThroughput/latencyMs/offline) |
POST |
| CPU slowdown multiplier (1=native, 4=4x slower) |
POST |
| Override navigator.language + Accept-Language |
POST |
| Override page timezone (e.g. "Asia/Tokyo") |
POST/DELETE |
| Grant/clear browser permissions (clipboard, notifications, etc.) |
POST |
| Begin in-memory recording for skill creation |
GET |
| Live event count while recording |
POST |
| Stop and (optionally) save events as a skill ( |
POST |
| Re-execute HAR entries and compare statuses |
GET |
| List active service workers |
GET/POST |
| Export full session state (cookies + storage + IDB); POST |
POST/GET |
| Auto-capture all downloads to a session-tagged dir |
GET/POST |
| Read/write the page's clipboard via navigator.clipboard |
POST |
| Smart waiters with timeouts |
GET |
| Extract clean RAG-friendly markdown from current page |
GET |
| List archived versions of a skill |
POST |
| Restore a previous version ( |
GET |
| Idle time in ms for a session |
POST |
| Reset idle counter |
GET |
| List sessions past idle timeout |
POST |
| Diff two snapshots ( |
POST |
| Vision-only click ("the blue Submit button") |
POST |
| Pre-create N empty contexts for sub-100ms session creation |
GET |
| Warm pool size + oldest entry age |
POST |
| Close all warm contexts |
POST |
| Inject highlight overlay for "AI is here" hints |
POST |
| Highlight a bbox with optional label |
POST |
| Background job that keeps pool topped up |
POST |
| Predict next action from memory ( |
GET |
| Static skill catalog (slug, tags, quality, runs, success_rate) for GitHub Pages hosting |
GET |
| Vision LLM cache hits/misses + persistent_size if SQLite-backed |
POST |
| Empty in-memory + on-disk vision cache |
GET |
| Captured HAR exported as standard HAR 1.2 (Chrome DevTools-importable) |
POST |
| Evict oldest entries beyond |
POST |
| Diff two trace event arrays (regression testing) |
POST |
| Execute named keyboard shortcut (newTab/copy/find/etc) |
GET |
| List available named shortcuts |
GET |
| Single-call snapshot of all server state |
POST/GET/DELETE |
| Save / list / load / delete reusable Plan blueprints |
POST |
| Execute multiple named shortcuts in sequence |
GET |
| Per-histogram p50/p95/p99/mean/count percentiles |
GET/POST/DELETE |
| Per-domain RPS limit (token-bucket throttle on navigate) |
POST |
| Load + execute a saved Plan blueprint |
POST |
| Convert a recorded skill into a Plan ( |
GET |
| Search action memory by selector substring |
POST |
| Spawn N parallel sessions, extract markdown from each URL |
POST |
| Chain N saved plans into a super-plan |
POST |
| Auto-fill form inputs by name/label/placeholder match |
POST |
| Compare two skills' events ( |
GET/POST/DELETE |
| Subscribe to events, POST to external URL (HMAC-signed when |
POST |
| Fire a test delivery to verify connectivity |
POST |
| Process a CSV: navigate per row, extract markdown, return enriched results |
GET |
| Pending webhook retry count |
POST/GET |
| Record / read per-skill LLM cost (token usage × rates) |
GET |
| Top-N most expensive skills by total cost |
POST |
| Validate a skill's structure before save ( |
POST |
| LLM-suggest 3-5 tags from skill description + events |
POST |
| Render TraceEvent[] as a self-contained HTML timeline page |
POST |
| LLM-write a one-line description from skill events |
GET |
| Cross-skill: selectors that worked for the same goal on other domains |
GET |
| Download action memory as CSV ( |
POST |
| Composite filter+sort query (domain, goal substr, selector substr, min_success_rate, min_runs, sort_by) |
GET |
| Accessibility audit (missing alt/label, heading skips, empty links, missing lang) |
GET |
| Export skill as .agbpkg (skill + plan + stats + readme) |
POST |
| Import an .agbpkg bundle |
GET |
| Per-domain action memory analytics + top selectors |
GET/POST/DELETE |
| Recurring skill execution ( |
POST |
| Recommend skills matching a goal text ( |
POST |
| Convert a built-in plan template into a runnable skill |
POST/GET |
| Per-session network bytes tracking |
GET |
| WCAG color contrast audit |
GET |
| List browser fingerprint presets (mac-chrome, iphone-15-pro, tokyo-iphone, etc.) |
POST |
| Apply a preset ( |
GET |
| Per-session heap/rss/external delta from session creation |
POST |
| Re-baseline the session memory snapshot |
GET |
| Detailed system health: process, engine, scheduler, billing, auth posture |
GET |
| Per-session CPU delta (user/system/wall ms + cpu_percent) |
POST |
| Re-baseline the session CPU snapshot |
POST |
| Top selectors across distinct domains (ship as starter packs) |
POST |
| Seed memory with distilled patterns from another deployment |
POST |
| Match goal to best skill and execute (no LLM round-trip) |
GET/POST/DELETE |
| Register weighted A/B routes between skill versions |
POST |
| Run an A/B-routed skill (weighted variant pick) |
WS |
| Live skill outcome firehose ( |
GET/DELETE |
| Aggregated success/failure stats per variant |
POST |
| Auto-promote winning variant (z-test gated) |
GET/DELETE |
| Per-skill p50/p95/p99 latency histograms |
GET/POST/DELETE |
| Skill library CRUD |
GET |
| Download .skill.json |
POST |
| Import .skill.json |
POST |
| Replay skill with bindings |
GET |
| Memory stats |
GET |
| What does the system know about :domain |
GET |
| Top selectors per domain with success/fail stats |
POST |
| Cross-domain selector hypotheses |
POST |
| TF-IDF / embedding action memory search |
POST |
| Force embedding-only search |
POST |
| Halve stale entry counts |
POST |
| Score similarity between two page snapshots |
POST |
| Suggest skills relevant for the current page |
POST |
| Run multiple skills in sequence |
GET |
| Per-skill success/fail history (filter by |
GET |
| Top-N performing skills by success_count |
GET |
| High-confidence skills (success_rate ≥ 0.9, ≥ 10 runs by default) |
POST |
| Remove skills below a success threshold (dry-run by default) |
DELETE |
| Delete a skill by name |
GET |
| List built-in plan templates |
GET |
| Remote health diagnostic (mirrors |
POST |
| Compact a trace file |
GET |
| Full OpenAPI 3.1 spec |
GET |
| Aggregated stats: site memory + action memory + skills + provider |
GET |
| Tier list with prices + limits (powers pricing page) |
POST |
| Stripe Checkout Session for |
POST |
| Stripe webhook receiver (signature-verified) |
GET |
| Prometheus scrape (no auth) |
GET |
| Readiness probe (503 until engine launched) |
GET |
| List currently-connected Chrome extension agents |
GET |
| Long-poll for next command (used by Chrome extension) |
POST |
| Send a command to a connected Chrome extension |
POST |
| Extension posts back command results |
DELETE |
| Drop agent state |
Production deployment
Docker
docker compose up -d
# AgentBrowser on http://localhost:3100
# Memory + traces persist in named volumeAuth + rate limit
AGENTBROWSER_API_KEYS=key1,key2 \
ANTHROPIC_API_KEY=sk-ant-... \
node dist/bin/http.jsRate limit defaults to 600 req/min/key. Override with AGENTBROWSER_RATE_LIMIT_PER_MINUTE.
Observability
Every action emits a typed
SessionEventto the in-process brokerWebSocket clients subscribe at
/api/v1/sessions/:id/events(with 200-event replay buffer for reconnect)JSONL traces at
~/.agentbrowser/traces/are the audit logcompactTrace()keeps long traces small without losing fidelity
Tested at
Capability | Status |
Type check ( | clean |
Test suite | 119 / 119 passing across 21 files |
Build ( | clean |
Real Chromium navigation | verified |
Bot detection bypass | passes Cloudflare interstitial, OneTrust, Cookiebot, Funding Choices, Reddit GDPR, Stack Overflow signup wall |
Cursor click → DOM event | verified end-to-end |
Action verifier | verified on URL change + element add/remove/text change |
Vision pipeline | annotated PNG verified by magic-byte + element list |
HTTP API | 13 integration tests against full Fastify stack |
Replay determinism | trajectory generator deterministic per seed |
License
MIT. Commercial use encouraged.
For acquisition or partnership inquiries: ashtonluca@gmail.com.
This server cannot be installed
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/AshtonVaughan/agentbrowser'
If you have feedback or need assistance with the MCP directory API, please join our Discord server