Browser Automation MCP
Uses Google's Gemini model for AI-powered browser actions in hybrid mode, enabling natural language control of the browser.
Provides tunneling for cloud browser sessions, allowing localhost URLs to be accessible from the cloud.
Leverages OpenAI's TTS model (gpt-4o-mini-tts) for generating narrated demo videos of browser scripts.
Automatically injects the x-vercel-protection-bypass header when accessing Vercel preview deployments, enabling seamless automation.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Browser Automation MCPgo to google.com and search for 'MCP'"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Browser Automation MCP
MCP server for AI browser automation. Alpha software - expect bugs and rough edges.
Attribution
This is a fork of @browserbasehq/mcp-server-browserbase by Browserbase, Inc., licensed under Apache 2.0.
Modifications from Original
Default to LOCAL - Uses local Playwright by default instead of requiring Browserbase cloud. Pass
cloud: trueto session create for cloud execution.Hybrid mode agent - Agent tool uses hybrid mode (DOM + coordinate-based actions) with
google/gemini-3-flash-previewinstead of CUA mode.Vercel header injection - Automatically injects
x-vercel-protection-bypassheader whenVERCEL_AUTOMATION_BYPASS_SECRETenv var is set.Renamed tools - All tools renamed from
browserbase_*tostagehand_*.
Tools
Tool | Description |
| Create browser session. |
| Close the current session |
| Navigate to a URL |
| Perform an action on the page (natural language) |
| Extract structured data from the page |
| Observe and find actionable elements |
| Capture a screenshot |
| Get current page URL |
| Autonomous multi-step execution (hybrid mode) |
| Load a committed Stagehand script file (default export from |
| Record a narrated mp4 of a known-good Stagehand script. See Demo videos. |
| Show help for agent-browser, a low-level CLI for precise browser control |
| Run a low-level browser command (snapshot, click by ref, network, JS eval, etc.) |
Stagehand vs agent-browser
Stagehand tools (stagehand_act, stagehand_extract, etc.) provide high-level, AI-powered browser control — good for acceptance testing and exploratory flows where natural language actions are convenient.
agent-browser tools (agent_browser_run) provide low-level, deterministic control — good for precise element interactions by ref, DOM inspection, network debugging, JS evaluation, and situations where Stagehand's abstractions are too coarse. agent-browser shares the same browser session as Stagehand via CDP, so you can freely mix both.
agent-browser is resolved via npx automatically — no global install required.
Environment Variables
MODEL_API_KEY=... # API key for the configured model provider (works with any provider)
GEMINI_API_KEY=... # alternative to MODEL_API_KEY for Gemini (the default model)
BROWSERBASE_API_KEY=... # only needed for cloud: true
BROWSERBASE_PROJECT_ID=... # only needed for cloud: true
NGROK_AUTHTOKEN=... # only needed for cloud: true with localhost URLs
VERCEL_AUTOMATION_BYPASS_SECRET=... # optional, for Vercel preview deployments
STAGEHAND_VARIABLES=... # optional, JSON map of variables auto-injected into stagehand_act, stagehand_agent, and stagehand_scenario (see Variables below)
OPENAI_API_KEY=... # only needed for stagehand_demo_video (TTS via gpt-4o-mini-tts)Variables
Stagehand supports templated variables in instructions so sensitive values (passwords, API keys, personal info) can be kept out of the text sent to the LLM. Reference them in any stagehand_act, stagehand_agent, or stagehand_scenario instruction as %varName% and Stagehand substitutes the value client-side just before the action runs.
There are three ways to supply variables. Later sources override earlier ones on key conflict:
Global — set the
STAGEHAND_VARIABLESenv var to a JSON object. Applies to every tool call and every CLI scenario run.Scenario-scoped — add a top-level
variablesfield to a scenario object (MCP tool or CLI--scenarioJSON). Applies to the agent call that runs the scenario.Per-call — pass
variablesas a parameter tostagehand_actorstagehand_agent.
All three use the same shape:
{
"password": { "value": "hunter2" },
"username": { "value": "user@example.com", "description": "login email" }
}description is optional. For agent calls it helps the model understand when to use each variable; for act calls it's ignored.
Example MCP client config with a global:
"env": {
"MODEL_API_KEY": "sk-ant-...",
"STAGEHAND_VARIABLES": "{\"password\":{\"value\":\"hunter2\",\"description\":\"login password\"}}"
}Example CLI scenario with a scenario-scoped variable:
browser-automation test --scenario '{"baseUrl":"https://example.com/login","variables":{"password":{"value":"hunter2"}},"steps":[{"step":"act","description":"Type %password% into the password field"},{"step":"assert","description":"Login succeeds"}]}'Caveat: screenshot leakage in hybrid mode
Stagehand guarantees that raw values never appear in the instructions sent to the LLM. But the agent tool runs in hybrid mode, which takes screenshots between steps, and any value typed into a non-masked input (search box, plain text field) will be rendered on the page and captured by the next screenshot. A vision model looking at that screenshot can read the value and echo it in its reasoning or final message. Password fields are safe because browsers mask them to dots; everything else is not. Variables protect the instruction channel, not the visible page.
Scripts
Scenarios (above) and the Stagehand agent are great for exploration but expensive to re-run: the agent re-plans every step and takes screenshots between actions, which is exactly what you want when figuring out a flow for the first time and exactly what you don't want on every CI build.
Scripts are the cheap, committed counterpart. A script is a TypeScript file whose default export is a function produced by defineScript(...). It calls Stagehand primitives (stagehand.act, stagehand.extract, stagehand.observe) directly — one LLM call per step, no planning, no screenshot recaps — while still surviving small UI drift because the instructions stay in natural language ("click the login button" keeps working if the button moves or gets restyled).
The intended workflow:
Walk through the test case once with the agent / primitives to figure out what instructions work.
Commit a script that replays those same instructions.
Run it as many times as you like — in CI, from
npm run e2e, from your test runner — at one-LLM-call-per-step cost.
Authoring a script
In Stagehand v3, act, extract, and observe are methods on the Stagehand instance — not on the page. page is the raw Playwright Page, used for goto and other navigation-level calls.
// tests/signup.stagehand.ts
import { defineScript } from "@popoverai/browser-automation/script";
import { z } from "zod";
import assert from "node:assert/strict";
export default defineScript(async ({ stagehand, page, ctx }) => {
await page.goto(ctx.baseUrl ?? "https://example.com/signup");
await stagehand.act(`type ${ctx.username ?? "test@example.com"} into the email field`);
await stagehand.act(`type ${ctx.password ?? "hunter2"} into the password field`);
await stagehand.act("click the sign up button");
const { heading } = await stagehand.extract(
"the main heading on the landing page",
z.object({ heading: z.string() }),
);
assert.match(heading, /welcome/i);
});The default ctx shape (BaseCtx) accepts baseUrl, username, password, and any other string field without extra declaration. If you need non-string fields, pass your own generic:
interface Ctx { productId: string; quantity: number }
export default defineScript<Ctx>(async ({ stagehand, page, ctx }) => { ... });Scripts throw to signal failure and return to signal success. They do not construct or close a Stagehand session — the caller owns lifecycle, which lets a single session be reused across many scripts.
Running a script via the MCP tool
Pass either a committed file path or inline source (exactly one):
stagehand_run_script({ path: "tests/signup.stagehand.ts", ctx: { baseUrl: "https://staging.example.com" } })stagehand_run_script({ source: "import { defineScript } from '@popoverai/browser-automation/script';\nexport default defineScript(async ({ page, ctx }) => { /* ... */ });", ctx: { ... } })Returns {"status": "passed", "durationMs": <n>} or {"status": "failed", "durationMs": <n>, "error": "...", "stack": "..."}.
Imports behave differently between the two modes:
pathmode — bare imports (defineScript,zod, etc.) resolve from the script's ownnode_modulestree. The script's project must have the needed deps installed.sourcemode — bare imports resolve against the MCP's ownnode_modules. No install required in the caller's workspace; the script can be run from anywhere, including callers that have no filesystem (inline string only).
Running a script from your own runner
For CI, npm run e2e, or a test framework:
import { Stagehand } from "@browserbasehq/stagehand";
import runSignup from "./tests/signup.stagehand.ts";
const stagehand = new Stagehand({ env: "LOCAL", model: "google/gemini-3-flash-preview" });
await stagehand.init();
try {
const page = stagehand.context.pages()[0];
await runSignup({ stagehand, page, ctx: { baseUrl: process.env.APP_URL } });
} finally {
await stagehand.close();
}Multiple scripts can share one session — init once, call each script's function in turn, close once. This path doesn't go through stagehand_run_script, so imports resolve normally against your project's node_modules.
What not to write in a script
Don't use
stagehand.agent()— that reintroduces the per-run planning cost scripts exist to avoid. Call the primitives directly.Don't lower to Playwright selectors (
page.locator("button[aria-label='Sign in']").click()). The natural-languagestagehand.actphrasing is what buys you resilience; CSS/ARIA selectors break on the next deploy.Don't hard-code credentials. Route them through
ctxso the caller controls them.
Demo videos
Generate a narrated mp4 walkthrough of a Stagehand flow. Each action runs through stagehand.act with a CDP screencast attached, narration is generated per-action via OpenAI TTS, and per-segment mp4s are concatenated into one final video.
The flow is meant for known-good scripts: explore with the regular tools to figure out what works, then call this once with the locked-in sequence and the narration you want spoken over each step.
Via the MCP tool
Make sure the active session is at the desired starting state (the tool reuses the active Stagehand session — it does not create one). Requires OPENAI_API_KEY.
stagehand_demo_video({
actions: [
{ instruction: "go to the login page", narrate: "navigating to the login page" },
{ instruction: "type the email and password", narrate: "entering credentials" },
{ instruction: "click the sign in button", narrate: "logging in" }
]
})
→ { videoPath: "/tmp/browser-automation-demos/<id>/final.mp4", outputDir, segments: [...] }Optional inputs: outputDir, voice (OpenAI voice id, default "alloy"), keepIntermediates (keep per-segment audio + mp4 + frame PNGs alongside final.mp4), trailingDelay (ms after each action before recording its end timestamp; default 1000ms), maxWidth / maxHeight (screencast capture size; default 1280x720).
Programmatic API
For programmatic narration, loops over data, conditional steps, or bundling into your own runner:
import { Stagehand } from "@browserbasehq/stagehand";
import { attachDemoRecorder } from "@popoverai/browser-automation/demo";
const stagehand = new Stagehand({ /* ... */ });
await stagehand.init();
const demo = await attachDemoRecorder(stagehand);
try {
await demo.act("go to the login page", "navigating to the login page");
await stagehand.extract({ /* ... */ }); // bare stagehand calls are ignored at render
await demo.act("type credentials", "entering credentials");
await demo.agent("complete the checkout", "the agent completes the checkout");
const { videoPath } = await demo.render({ outputDir: "./out", voice: "alloy" });
} finally {
// Idempotent — safe to call before, after, or instead of render(). Use when
// you want to abort cleanup without producing an mp4.
await demo.stop();
}attachDemoRecorder is additive — it starts a CDP screencast and adds demo.act / demo.agent / demo.render / demo.stop, but the Stagehand instance keeps its full surface for everything else (extract, observe, navigate, etc.). Frames captured during un-narrated time are simply not selected at render.
The full surface:
Method | Purpose |
| Run a |
| Run a |
| Read the captured |
| Stop the screencast, run TTS + ffmpeg, return |
| Stop the screencast and detach without rendering. Idempotent. Use in |
Caveats
Native ffmpeg binary. Pulls in
ffmpeg-static(~44MB downloaded postinstall). Edge runtimes (Cloudflare Workers, Vercel Edge) can't run native binaries — Node serverless (Vercel Fluid Compute, Lambda) is fine.Single TTS provider in v1. OpenAI
gpt-4o-mini-ttsviaOPENAI_API_KEY.createOpenAITTSthrows at construction time if no key is available, so missing-key errors surface clearly. Pluggable via thettsoption torenderTimelineif you need a different backend.Failure semantics. If any action throws inside the MCP tool,
demo.stop()runs as cleanup and the original error propagates — no partial video is produced. Ifstop()itself fails, the cleanup error is logged to stderr and attached ascauseon the wrapped error.Stagehand v3 internal API. The recorder reads CDP via
stagehand.context.activePage().getSessionForFrame(...)— Stagehand v3's documented (but not stability-guaranteed) path. A future Stagehand upgrade that moves these methods will surface a clear "v3 internal API may have changed" error at attach time.
Localhost Tunneling (Cloud Mode)
When using cloud mode (cloud: true), the browser runs on Browserbase's infrastructure and can't directly access your localhost. If you navigate to a localhost URL, the server automatically creates an ngrok tunnel to expose your local service to the cloud browser.
Requires
NGROK_AUTHTOKENenvironment variableTunnels are session-scoped and cleaned up automatically
Each tunnel gets randomly generated basic auth credentials for security
Only triggered when navigating to localhost URLs in cloud mode
CLI
Test Command
Run browser-based assertions from the command line using the Stagehand agent. Each invocation runs a single browser session where all assertions are checked:
browser-automation test <url> <assertions...> [options]
browser-automation test --scenario <json-or-file> [options]Examples:
# Simple assertions
browser-automation test "https://example.com" "The page has a heading"
# Multiple assertions (same browser session)
browser-automation test "https://example.com" \
"The page has a heading" \
"There is a link on the page" \
"The title contains 'Example'"
# Using a custom model
browser-automation test --modelName "anthropic/claude-haiku-4-5" \
--modelApiKey "sk-ant-..." \
"https://example.com" "The page has a heading"
# Multi-step scenario (arrange/act/assert)
browser-automation test --scenario '{"baseUrl":"https://example.com","steps":[{"step":"act","description":"Click the More information link"},{"step":"assert","description":"Page navigated away from example.com"}]}'Scenarios can also reference templated variables (see Variables) — either from the STAGEHAND_VARIABLES env var or from a top-level variables field on the scenario object itself.
Returns JSON results (one per assertion):
{"results":[{"status":"passed","notes":"The page has a heading 'Example Domain'"}]}Each result contains:
status:"passed"|"failed"|"blocked"notes: explanation of the result
Exit codes: 0 if all assertions pass, 1 otherwise.
Options:
Option | Description |
| JSON scenario string or file path (mutually exclusive with positional url/assertions) |
| Include token usage data in the JSON output |
| Model to use (default: |
| API key for the model provider |
| Use Browserbase cloud browser instead of local Playwright |
When --usage is passed, a usage field is added to the JSON output alongside results:
{
"results": [{"status": "passed", "notes": "The page title is 'Example Domain'"}],
"usage": {
"model": "google/gemini-3-flash-preview",
"input_tokens": 16223,
"output_tokens": 47,
"reasoning_tokens": 474,
"cached_input_tokens": 7990,
"inference_time_ms": 10336
}
}MCP Usage
Basic (Stagehand tools only):
{
"mcpServers": {
"browser": {
"command": "npx",
"args": ["@popoverai/browser-automation"],
"env": {
"MODEL_API_KEY": "your-api-key"
}
}
}
}With a custom model and Playwright federation:
{
"mcpServers": {
"browser": {
"command": "npx",
"args": ["@popoverai/browser-automation", "--enable-playwright", "--modelName", "anthropic/claude-haiku-4-5"],
"env": {
"MODEL_API_KEY": "sk-ant-..."
}
}
}
}The --enable-playwright flag spawns a Playwright MCP subprocess and federates its tools (click, fill, type, etc.) alongside the Stagehand AI tools.
License
Apache-2.0 (same as original)
This server cannot be installed
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/PopoverAI/browser-automation'
If you have feedback or need assistance with the MCP directory API, please join our Discord server