Scout
Supports CSS selectors for element extraction and interaction.
Provides a Gin-like API for building browser automation workflows.
Source code hosted on GitHub; no direct automation integration.
Installation method via Homebrew; no automation integration.
Execute JavaScript expressions on web pages.
Converts web pages to compact Markdown format.
Mentioned as a dependency not used; no integration.
Used for installing UI dependencies; no automation integration.
Integration with Ollama for local LLM-powered browser automation.
Integration with OpenAI API for AI-driven browser automation.
Detects React framework and reads component state.
No specific integration; SVG is a format.
Used in stealth patches; no integration.
A single statically-linked scout binary gives you a CLI, an 87-tool MCP server (so any MCP-aware agent — Claude Desktop, Cursor, Cline, custom — has a browser), a conversational chat UI, and a Go library with Gin-like middleware composition. Same engine, four access points.
brew install klarlabs-studio/tap/scoutvs. Playwright
Scout | Playwright | |
Install | one ~15 MB binary | npm + ~600 MB browser cache |
Runtime dep | none (static) | Node.js always; Python/Java/.NET as wrappers |
Drive from | Go, any shell, MCP, chat UI | TS/JS first-class; others lag |
AI-agent native | built-in | separate |
Token-aware extraction | DOM diff, distillation, observation budgets (50–80% fewer tokens) | not provided |
Action playbooks | record & replay deterministic JSON | codegen produces a script you maintain |
Container deploy | drop into | carry Node + browser binaries |
CDP access | direct WebSocket, zero abstraction | internal protocol over CDP |
Related MCP server: Pilot
Quick Start
# CLI — visible browser, one-shot commands
scout observe https://example.com # structured page snapshot
scout markdown https://news.ycombinator.com # page as compact markdown
scout screenshot https://github.com # save screenshot
scout extract https://example.com h1 # extract element text
scout frameworks https://react.dev # detect React, Vue, etc.
# MCP Server — give AI agents browser superpowers
claude mcp add scout -- scout mcp serve
# Browser UI — conversational browser automation
scout ui serve --provider=ollama --model=mistral
cd ui && npm install && npm run dev # open http://localhost:3000Install
# Homebrew
brew install klarlabs-studio/tap/scout
# Direct binary
curl -fsSL https://raw.githubusercontent.com/klarlabs-studio/scout/main/install.sh | bash
# Go
go install go.klarlabs.de/scout/cmd/scout@latest
# As a library
go get go.klarlabs.de/scoutMCP Server — 77 Tools
Run scout mcp serve and any MCP-aware agent has a browser. No second project to install, no Node runtime, no Python interpreter — the binary is the server. Configure in any MCP client:
claude mcp add scout -- scout mcp serve # Claude Code{"mcpServers": {"scout": {"command": "scout", "args": ["mcp", "serve"]}}}Tool Categories
Category | Tools |
Navigation |
|
Interaction |
|
Forms |
|
Extraction |
|
Capture |
|
Network |
|
Tabs |
|
Frameworks |
|
Playback |
|
Video |
|
Smart Helpers |
|
Vision |
|
Batch |
|
Iframe |
|
Trace |
|
Cookies |
|
Diagnostics |
|
Utility |
|
All tools have MCP annotations (ReadOnly, OpenWorld, ClosedWorld, Idempotent) for smart auto-approval. Read-only tools like observe, extract, and screenshot run without permission prompts.
Runtime Configuration
Switch between headless and visible browser without restarting, and opt into local-dev workflows (loopback, private IPs):
Agent: configure(headless: false) → browser window appears
Agent: navigate("https://...") → watch it work
Agent: configure(headless: true) → back to headless
Agent: configure(allow_private_ips: true) → unlock localhost / 192.168.* / 10.*
Agent: navigate("http://127.0.0.1:4173/") → drive your local dev serverThe MCP server also reads SCOUT_ALLOW_PRIVATE_IPS=1 at startup as a one-shot toggle for trusted environments.
Screen Recording (video)
Record the active page as a video. Pure CDP — works in headless, no Playwright needed. Recording survives navigate, open_tab, and switch_tab calls in between, so a multi-page demo lands as one continuous clip:
Agent: start_screen_recording({ width: 1280, height: 800, fps: 15, format: "webm" })
Agent: navigate("https://example.com")
Agent: click("#cta")
Agent: navigate("https://example.com/dashboard") # recording continues across pages
Agent: stop_screen_recording()
→ { path: "/tmp/scout-rec-XXX.webm", format: "webm", encoder: "ffmpeg",
frame_count: N, duration_ms: N }If ffmpeg is on PATH, the result is encoded to WebM (libvpx-vp9) or MP4 (libx264). If not, scout returns the raw JPEG frames directory plus an ffmpeg concat list so you can encode offline. The result is always a file path, never base64 — never enters your LLM token budget.
Realistic FPS: ~10–15 on typical pages, capped at 30. Implementation polls Page.captureScreenshot (CDP Page.startScreencast events are silently dropped under --headless=new Chrome).
Browser UI
A conversational browser automation interface. Type natural language, watch the browser respond in real-time.
# Start the AG-UI server (Go backend)
scout ui serve --provider=ollama --model=mistral # local, no API key
scout ui serve --provider=claude # needs ANTHROPIC_API_KEY
scout ui serve --provider=openai --model=gpt-4o # needs OPENAI_API_KEY
scout ui serve --provider=groq --base-url=https://api.groq.com/openai --model=llama-3.3-70b-versatile
# Start the Vue frontend
cd ui && npm install && npm run dev # http://localhost:3000The UI streams AG-UI protocol events over SSE:
Chat panel with markdown rendering and quick-action pills
Live browser viewport with screenshot streaming and URL bar
Activity timeline showing tool calls in real-time
Stop button to cancel mid-stream
The Go server handles the agentic loop: LLM decides which scout tools to call, executes them, streams browser state deltas back to the frontend. Supports any OpenAI-compatible endpoint via --base-url.
Agent Package (Go)
High-level Go API for callers that want to embed scout in a program. Structured output, auto-wait, goroutine-safe. Most users reach scout through the CLI or MCP server above — this section is for the Go-library path.
session, _ := agent.NewSession(agent.SessionConfig{Headless: true})
defer session.Close()
// Navigate and observe
session.Navigate("https://example.com")
obs, _ := session.Observe() // links, inputs, buttons, text + action costs
// DOM diff — only what changed (saves 50-80% tokens)
session.Click("#submit")
_, diff, _ := session.ObserveDiff()
// diff.Classification: "modal_appeared"
// diff.Summary: "Modal/dialog appeared: Login required"
// Semantic form filling — no CSS selectors
session.FillFormSemantic(map[string]string{
"Email": "user-example", "Password": "secret",
})
// Visual grounding — click by number, not selector
result, _ := session.AnnotatedScreenshot() // numbered labels on elements
session.ClickLabel(7) // click element [7]
// Multi-tab coordination
session.OpenTab("pricing", "https://example.com/pricing")
session.SwitchTab("default")
// Framework detection (19 frameworks)
frameworks, _ := session.DetectedFrameworks() // ["react", "nextjs"]
state, _ := session.ComponentState("#app") // read React/Vue state
// Network capture — read API responses directly
session.EnableNetworkCapture("/api/")
captured := session.CapturedRequests("/api/users")
// Action replay — record once, replay without LLM
session.StartRecordingPlaybook("login-flow")
// ... do stuff ...
pb, _ := session.StopRecordingPlaybook()
agent.SavePlaybook(pb, "login.json")
// Later: session.ReplayPlaybook(pb) // 100x cheaper
// Persistent profiles
session.SaveProfile("session.json") // cookies + localStorage
session.LoadProfile("session.json")
// Content distillation (5 levels)
session.Markdown() // ~2-8KB compact markdown
session.ReadableText() // ~1-4KB main content only
session.AccessibilityTree() // ~1-4KB semantic tree
session.ObserveWithBudget(500) // fit in ~500 tokensCore Library (Go)
Gin-like Engine/Context/Group/HandlerFunc with middleware composition. The lowest-level Go API — use it when you want full control of task lifecycle, named groups, and middleware chains:
engine := browse.Default(browse.WithHeadless(true))
engine.MustLaunch()
defer engine.Close()
engine.Use(middleware.Stealth())
engine.Use(middleware.Retry(middleware.RetryConfig{MaxAttempts: 3}))
engine.Use(middleware.Timeout(30 * time.Second))
admin := engine.Group("admin", middleware.BasicAuth("admin", "secret"))
admin.Task("export", func(c *browse.Context) {
c.MustNavigate("https://app.example.com/admin")
table, _ := c.ExtractTable("#users")
c.Set("data", table)
})
engine.RunGroup("admin")Middleware
Category | Middleware |
Resilience |
|
Auth |
|
Anti-detection |
|
Network |
|
Utilities |
|
CLI
The CLI is a deliberate one-shot tier: each command launches a browser,
navigates, runs one read/diagnostic action, prints the result, and exits.
Stateful interactive flows (click→type→submit, multi-tab, live cookie/network
manipulation) live in the MCP server (scout mcp serve) and chat UI
(scout ui serve), which keep a session alive across actions.
CLI defaults to visible browser (--headless to hide):
scout navigate <url> # page info as JSON
scout observe <url> # structured observation
scout markdown <url> # compact markdown
scout readable <url> # main readable content (boilerplate stripped)
scout accessibility <url> # semantic accessibility tree
scout screenshot <url> [--output f] # save screenshot
scout pdf <url> [--output f] # save PDF
scout extract <url> <selector> # extract text
scout table <url> <selector> # extract a table as structured rows
scout auto-extract <url> # auto-detect dominant data pattern
scout eval <url> <expression> # run JavaScript
scout form discover <url> # discover form fields
scout frameworks <url> # detect frameworks
scout app-state <url> # global app/store state (JSON)
scout aria <url> # ARIA violation report (JSON)
scout vitals <url> # Core Web Vitals (JSON)
scout console <url> # console errors + network 4xx/5xx (JSON)
scout dialog <url> # detect visible modal/dialog (JSON)
scout auth-wall <url> # detect login/auth wall (JSON)
scout cookies <url> # list cookies (values redacted)
scout watch <url> [--interval=5s] # live-watch page changes
scout pipe <command> [selector] # batch process URLs from stdin
scout record <url> [--output f] # interactive recording → playbook
scout mcp serve # start MCP server
scout ui serve [flags] # start chat UI
scout version # print versionArchitecture
scout/
├── browse.go, engine.go, context.go # Gin-like API
├── page.go, selection.go # CDP page & element interaction
├── recorder.go # Action playbook recording (Navigate/Click/Type → JSON)
├── middleware/ # stealth, resilience, auth, network
├── agent/ # AI agent API (50+ methods)
│ ├── session.go # Session lifecycle, Navigate, Click, Type
│ ├── observe.go, diff.go # Observe, ObserveDiff, cost estimation
│ ├── content.go # Markdown, ReadableText, AccessibilityTree
│ ├── form.go # DiscoverForm, FillFormSemantic, MatchFormField
│ ├── annotate.go # AnnotatedScreenshot, ClickLabel
│ ├── network.go # EnableNetworkCapture, CapturedRequests
│ ├── spa.go # DetectedFrameworks, ComponentState, GetAppState
│ ├── tabs.go # OpenTab, SwitchTab, CloseTab, ListTabs
│ ├── playbook.go # StartRecording, ReplayPlaybook, SavePlaybook
│ ├── interact.go # Hover, DragDrop, SelectOption, ScrollTo
│ ├── profile.go # CaptureProfile, ApplyProfile, SaveProfile
│ ├── selector.go # Playwright :text() selector translation
│ ├── budget.go # ObserveWithBudget, EstimateTokens
│ ├── nlselect.go # SelectByPrompt, fuzzy NL element matching
│ ├── batch.go # ExecuteBatch, sequential multi-action
│ ├── vision.go # HybridObserve, FindByCoordinates
│ ├── trace.go # StartTrace, StopTrace, action tracing
│ ├── screencast.go # StartScreenRecording / StopScreenRecording — video via captureScreenshot polling + ffmpeg encode
│ ├── iframe.go # SwitchToFrame, SwitchToMainFrame
│ └── vitals.go # WebVitals (LCP/CLS/INP)
├── internal/cdp/ # WebSocket CDP client (context-aware)
├── internal/launcher/ # Chrome process management
├── cmd/scout/ # CLI + MCP server (87 tools)
└── docs/ # Landing page (GitHub Pages)Security
Vulnerability scanning runs on every push and PR via nox. Findings are uploaded to GitHub code scanning, annotated inline on PRs, and gated against .nox/baseline.json so regressions block merges. The status badge in the header reflects the latest main-branch scan.
nox also drives dependency remediation in place of Dependabot — the Nox Remediate workflow runs weekly (Monday 06:00 UTC) and on demand, executing nox fix against fresh OSV.dev findings and opening a single PR with the verified upgrades.
# Local scan
nox scan -severity-threshold high .
# Local fix
nox fix -input findings.jsonLicense
MIT
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
- Your AI Chatbot Just Exposed Your CEO's Salary to an InternBy Om-Shree-0709 on .Agent IdentityMCP SecurityOAuth Delegation
- Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)By Om-Shree-0709 on .Agentic AiPrompt InjectionWebAssembly
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/klarlabs-studio/scout'
If you have feedback or need assistance with the MCP directory API, please join our Discord server