agent-eyes
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@agent-eyessearch for latest AI news and summarize the top article"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Agent Eye
Give AI agents eyes on the web. Agent Eye is an MCP server that wraps Playwright for local browser search, navigation, extraction, and interaction. It features intelligent LLM-based DOM extraction with multi-pass reconciliation, consensus voting, DOM caching, and field-specific confidence thresholds.
What Is Implemented
Browser navigation with persistent or incognito modes
Search, interact, screenshot, content extraction, JS evaluation tools
5-Phase LLM-based DOM extraction (chunking, reconciliation, consensus voting, caching, field thresholds)
AI workflow orchestration tool powered by Ollama
Session persistence with cookie save/load per profile
Domain allowlist/denylist checks
Memory pressure protection
.env support via dotenv bootstrap at startup
Pre-commit git hooks (Husky) for test validation
AGPL-3.0 license (free for non-commercial/personal use)
Related MCP server: LCBro
Requirements
Node >= 22.14.0
Playwright Chromium installed
Setup
Install dependencies
npm installInstall Chromium
npx playwright install chromiumConfigure environment
cp .env.example .envRun
Development:
npm run devBuild + start:
npm run build
npm run startType check:
npm run typecheckTests:
npm run testEnvironment Variables
Core:
AGENT_EYE_HEADLESS=true|false
AGENT_EYE_NAV_TIMEOUT_MS=30000
AGENT_EYE_MAX_CHARS=50000
AGENT_EYE_PROFILE_ROOT=~/.agent-eye/profiles
AGENT_EYE_MODEL=deep-seek|llama2|default (content-budget profile for browser_get_content, not Ollama model name)
AGENT_EYE_SKILL_STALE_FOCUSED_CHARS=400 (refresh skill if cached focused content is too small)
AGENT_EYE_SKILL_STALE_SELECTOR_MATCHES=1 (refresh skill if cached selectors barely match DOM)
AGENT_EYE_SKILL_STALE_FAILURE_THRESHOLD=2 (refresh skill after repeated selector failures)
Ollama:
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=gemma4:e4b-it-q4_K_M
Recommended with your local model:
Keep AGENT_EYE_MODEL=default
Set OLLAMA_MODEL to your installed model tag (for example gemma4:e4b-it-q4_K_M)
Note: OLLAMA_MODEL from .env is loaded automatically because src/index.ts imports dotenv/config before server startup.
MCP Tools
browser_navigate
browser_get_content
browser_search
browser_interact
browser_screenshot (with intelligent SPA hydration waits)
browser_evaluate_js
browser_analyze_page (with skill cache + stale-skill auto-refresh)
browser_extract_section (token-efficient targeted extraction)
browser_ocr_chunk (visual fallback when selector extraction is weak)
ai_orchestrate_workflow
AI Orchestration
ai_orchestrate_workflow accepts a high-level goal and lets the orchestrator decide next actions step-by-step. Built-in extraction uses 5-phase LLM system:
Phase 1: Chunked per-field extraction with confidence scoring
Phase 2: Global reconciliation pass for conflict resolution
Phase 3: Consensus voting across multiple runs for reliability
Phase 4: DOM fingerprint caching to skip redundant extraction
Phase 5: Field-specific confidence thresholds for domain-aware validation
See EXTRACTION_PHASES.md for detailed architecture.
Input:
goal (required)
context (optional)
maxSteps (optional)
model (optional; defaults to OLLAMA_MODEL then gemma4:e4b-it-q4_K_M)
Output:
success
steps_executed
history of chosen actions
result object
error (when present)
Use cases and integration guidance:
Session Persistence
Persistent profiles are stored under AGENT_EYE_PROFILE_ROOT.
Cookies are loaded when a persistent profile starts
Cookies are saved when session manager closes
Profile name allows separate session histories per workflow/site
Avoiding Bot Detection & Human Verification
Some websites employ aggressive bot detection (such as CAPTCHAs, Cloudflare walls, or login requirements). To prevent your agent from being blocked:
Warm the session: Run the warming script to manually open a non-headless browser window:
npm run warm-sessionSolve challenges manually: Interact with the page (e.g. solve the CAPTCHA, log in, or accept cookies) in the opened browser window. This saves the verified session cookies to your persistent profile.
Run your agent: The agent will automatically reuse these saved cookies on subsequent runs to bypass the verification check. Note that cookies and sessions expire over time, so you may need to run
npm run warm-sessionagain if the agent starts encountering bot blocks.
Skill Management
Agent Eye features a project-local skill caching system that maps CSS selectors to page structures to avoid redundant LLM page-analysis queries.
1. How Skills Are Learned & Saved
When the agent visits a page and calls the
browser_analyze_pagetool, the system checks if a matching skill pattern already exists.If no skill exists, the tool splits the page into chunks, summarizes them using the local LLM, and builds a structural
PageMap(identifying main content vs noise selectors).This
PageMapis automatically saved as a JSON skill file under the local./agent-skills/<domain>.jsondirectory.Consecutive page loads on the same domain increments the
successCountof the skill.Each pattern also tracks success/failure metadata so stale selectors can be detected and refreshed automatically.
2. Matching and URL Routing
Skill matching is entirely URL-driven (not prompt-driven).
When a page is analyzed, the system extracts the domain and matches specific path patterns using the first two directory segments (e.g.,
vnexpress.net/the-thao).If no exact path-level pattern matches the current URL, the system falls back to the top-level domain pattern (
vnexpress.net).
3. Managing Skills via CLI
You can inspect, refresh, or remove learned domain skills using the Skill CLI:
# List all learned domain skills
npx tsx examples/skill-cli.ts list
# View the full JSON pattern for a specific domain
npx tsx examples/skill-cli.ts show vnexpress.net
# Delete a skill (forces the agent to re-analyze page structure on next visit)
npx tsx examples/skill-cli.ts delete vnexpress.netAlternatively, you can force the agent to overwrite an existing skill dynamically by calling browser_analyze_page with the "forceReanalyze": true argument.
4. Stale Skill Auto-Refresh
When a cached pattern returns very low focused content and poor selector matches, the system treats the skill as stale.
The stale pattern is marked with a failure increment, then
browser_analyze_pageimmediately falls back to fresh analysis in the same call.For general-agent loops, repeated low-quality
browser_extract_sectionresults trigger a deterministic escalation:Force one
browser_analyze_pagerefreshIf still failing, escalate to
browser_ocr_chunk
Security & Quality
Code Quality:
Pre-commit git hooks (Husky) validate all tests before commit
Enable with
npm run prepareHook lives at
.husky/pre-commitand runsnpm test
License:
AGPL-3.0: Free for non-commercial and personal use
Commercial use requires explicit permission
Extraction Safety:
LLM confidence scoring prevents low-reliability extractions
Field-specific thresholds apply per-field validation
Evidence tracking for debugging extraction failures
browser_evaluate_js restricted mode blocks dangerous API patterns unless explicitly unsafe
Optional domain security blocks navigation to disallowed domains
Memory guard blocks new navigations when critical threshold is reached
Integration Tests
Relevant integration coverage:
tests/integration/browser-tools.test.ts
tests/integration/huntrix-golden-e2e.test.ts
tests/integration/ai-orchestrator-live.test.ts
tests/integration/ai-orchestrate-mcp-client.test.ts
tests/integration/youtube-search.test.ts
The integration tests validate:
ai-orchestrator-live: High-level prompt execution with JSON extraction (requires Ollama; opt in via
RUN_LIVE_AI_ORCHESTRATOR_TEST=true)browser-tools: Screenshot waits for dynamic content (YouTube SPA hydration)
dom extraction: 5-phase system (chunking, reconciliation, consensus, caching, thresholds)
Client Integration & Production Simulation
For general browser automation goals, you should drive the browser step-by-step from your own orchestration service (as ai_orchestrate_workflow is optimized specifically for YouTube metadata extraction).
1. General Agent Loop Client Script
We provide a production-ready simulation client script in examples/general-agent-client.ts that demonstrates how to:
Connect to the
agent-eyesMCP server over stdio transport.Query a local Ollama model (e.g.
gemma4:e4b-it-q4_K_M) using simplified HTML context and action history.Run actions step-by-step using MCP tools.
2. Running Simulations
You can evaluate the system on any prompt goal using:
# Run with default goal (VNExpress news summary)
npx tsx examples/general-agent-client.ts
# Run with a custom prompt
npx tsx examples/general-agent-client.ts "search what is the best trend in github.com"3. Capturing Debug Screenshots
For easier debugging, you can enable step-by-step browser screenshots by appending the --screenshot CLI option. This will capture viewports at each step using the browser_screenshot tool and write them to a local debug-screenshots/ directory:
npx tsx examples/general-agent-client.ts "search what is the current stock for VCB (HOSE vn)" --screenshot3.1 Example: CNN News Summary (10 items)
Example command:
npx tsx examples/general-agent-client.ts "go to cnn.com and summary for me list 10 of news" --screenshotExpected behavior:
Agent navigates to
https://edition.cnn.com/Runs
browser_analyze_pageand extracts structureditems[]with source URLs/imagesReturns exactly 10 news summaries with source links
π Goal Reached!
Based on the visible content of CNN.com from June 28, 2026, here are 10 news summaries visible on CNN.com:
1. Germany's new Nazi party databases are challenging decades-held sanitized family narratives
Source: https://edition.cnn.com/world/germans-nazi-past-far-right-intl
2. Israel's military and tech industry race to counter Hezbollah's latest threat
Source: https://edition.cnn.com/world/middleeast/israel-tech-hezbollah-drone-threat-intl
3. The US and Iran have a deal on paper. At sea, the Strait of Hormuz is 'chaotic'
Source: https://edition.cnn.com/world/iran-us-hormuz-agreement-mou-intl
4. Live updates: Death toll climbs to over 1,400 in Venezuela quakes
Source: https://edition.cnn.com/world/live-news/venezuela-earthquake-hnk3.2 Example: YouTube Video Metadata (likes, views, release date)
Example command:
npx tsx examples/general-agent-client.ts "search Golden Official, to see that video has how many like, view and release date" --screenshotExpected behavior:
Agent runs YouTube-focused search via
browser_searchwithsite="youtube"Navigates to the selected watch URL
Uses
browser_analyze_pageon the watch page to extract metadataReturns video title, views, release date, and likes (if visible)
Sample output excerpt:
π Goal Reached!
Based on the provided YouTube page content for "Golden" Official Lyric Video | KPop Demon Hunters | Sony Animation:
- Video Title: βGoldenβ Official Lyric Video | KPop Demon Hunters | Sony Animation
- Views: 1,531,178,811 views
- Release Date: Jun 23, 2025
- Likes: The explicit number of likes is not visible in the provided focused content.3.3 Example: OpenAI vs Claude News + Benchmark Table
Example command:
npx tsx examples/general-agent-client.ts "list me 5 news about AI models from OpenAI and Claude, which one is better, I need some benkmark or even the table for easy compare features" --screenshotExpected behavior:
Agent starts from Google search and gathers relevant comparison sources
Navigates into a detailed article and runs
browser_analyze_pageReturns a structured comparison with benchmark values and an easy-to-read table
Includes a practical recommendation on βwhich one is betterβ by use case
Sample output excerpt:
π Goal Reached!
Comparative Overview (Benchmarks Table)
| Category | GPT-5.4 (OpenAI) | Claude Opus 4.6 (Anthropic) | Gemini 3.1 Pro | Grok 4 |
| :--- | :--- | :--- | :--- | :--- |
| Coding | Strong (74.9% SWE-bench) | Strong (74%+; powers Cursor) | Good (63.8%, 1M context) | Leader (75%) |
| Reasoning | 92.8% GPQA | 91.3% GPQA | Leader (94.3% GPQA) | Competitive |
| Writing | Good (Canvas editor) | Leader (128K output, natural prose) | Good (Docs integration) | Uncensored style |
| API Price (in/out per 1M) | $2.50/$15 | $15/$75 (Opus), $3/$15 (Sonnet) | $2/$12 | $2/$15 |
Summary on which is better:
- Claude Opus 4.6 is stronger for long-form writing and natural prose.
- OpenAI GPT-5.4 remains a strong all-rounder with broad ecosystem support.3.4 Example: Company Information Discovery (Alphaus Cloud)
Example command:
npx tsx examples/general-agent-client.ts "help me to find information about Alphaus Cloud company" --screenshotExpected behavior:
Agent starts with
browser_searchon Google to gather initial company sourcesNavigates to the official website (
https://www.alphaus.cloud/en)Runs
browser_analyze_pageand refreshes stale skill selectors when neededReturns a structured company profile: mission, credentials, products, and services
Sample output excerpt:
π Goal Reached!
Based on the Alphaus Cloud landing page, here is detailed information about the company:
Company overview and mission:
- Alphaus Cloud focuses on cloud cost intelligence and FinOps-driven optimization.
Key credentials:
- Claims No. 1 position in Japan with over $200M+ cloud resources managed annually.
- Preferred FinOps solution by 25% of AWS Premier Partners in Japan.
- Reports 3000+ monthly active users on Ripple.
Core products and services:
- Octo: cost visibility and optimization for cloud users.
- Ripple: billing automation and reseller-focused cloud cost management.
- Professional Services: FinOps implementation and cross-team enablement.3.5 Example: Localized Product + Price Discovery (Lynk & Co 08 in HCMC)
Example command:
npx tsx examples/general-agent-client.ts "find information about Lynk & co 08 in Vietnam? and current price for that car in Ho Chi Minh?" --screenshotExpected behavior:
Agent runs localized search with Vietnamese keywords via
browser_searchUses focused content to extract product profile and local on-road pricing
Returns model details and city-specific price estimates for Ho Chi Minh City
Sample output excerpt:
π Goal Reached!
General information on Lynk & Co 08:
- D-segment SUV, PHEV (1.5 Turbo + electric motor)
- EV range up to 200 km, total range over 1,400 km
Current price in Ho Chi Minh City (HCMC):
- Lynk & Co 08 Pro:
- Factory list price: 1,299,000,000 VND
- Estimated on-road price (HCMC): 1.43-1.45 billion VND
- Lynk & Co 08 Halo:
- Factory list price: 1,389,000,000 VND
- Estimated on-road price (HCMC): 1.53-1.55 billion VND4. Integration Code Pattern
To call this MCP server from your own Node.js service, use the following code pattern:
import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { StdioClientTransport } from '@modelcontextprotocol/sdk/client/stdio.js';
const transport = new StdioClientTransport({
command: 'npx',
args: ['tsx', 'src/index.ts'],
cwd: '/path/to/agent-eyes',
});
const client = new Client({ name: 'my-orchestrator', version: '1.0.0' }, { capabilities: {} });
await client.connect(transport);
// 1. Navigate
await client.callTool({
name: 'browser_navigate',
arguments: { url: 'https://vnexpress.net' },
});
// 2. Get simplified page content for LLM context
const contentResult = await client.callTool({
name: 'browser_get_content',
arguments: { format: 'simplified-html', includeInteractiveMap: true },
});
const textContent = (contentResult as any).content?.find((c: any) => c.type === 'text')?.text;
const payload = JSON.parse(textContent);
console.log(payload.content); // Simplified HTML content
console.log(payload.interactiveMap); // Actionable ID mapping
// 3. Interact (e.g. click element with data-agent-id="34")
await client.callTool({
name: 'browser_interact',
arguments: { action: 'click', target: '34' },
});
// 4. Capture screenshot for debugging
const screenshotResult = await client.callTool({
name: 'browser_screenshot',
arguments: { fullPage: false },
});
const imageItem = (screenshotResult as any).content?.find((c: any) => c.type === 'image');
if (imageItem?.data) {
// imageItem.data contains raw base64 PNG data
fs.writeFileSync('debug-step.png', Buffer.from(imageItem.data, 'base64'));
}
await client.close();This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/henry0hai/agent-eyes'
If you have feedback or need assistance with the MCP directory API, please join our Discord server