Skip to main content
Glama

Agent Eye

Give AI agents eyes on the web. Agent Eye is an MCP server that wraps Playwright for local browser search, navigation, extraction, and interaction. It features intelligent LLM-based DOM extraction with multi-pass reconciliation, consensus voting, DOM caching, and field-specific confidence thresholds.

What Is Implemented

  • Browser navigation with persistent or incognito modes

  • Search, interact, screenshot, content extraction, JS evaluation tools

  • 5-Phase LLM-based DOM extraction (chunking, reconciliation, consensus voting, caching, field thresholds)

  • AI workflow orchestration tool powered by Ollama

  • Session persistence with cookie save/load per profile

  • Domain allowlist/denylist checks

  • Memory pressure protection

  • .env support via dotenv bootstrap at startup

  • Pre-commit git hooks (Husky) for test validation

  • AGPL-3.0 license (free for non-commercial/personal use)

Related MCP server: LCBro

Requirements

  • Node >= 22.14.0

  • Playwright Chromium installed

Setup

  1. Install dependencies

npm install
  1. Install Chromium

npx playwright install chromium
  1. Configure environment

cp .env.example .env

Run

Development:

npm run dev

Build + start:

npm run build
npm run start

Type check:

npm run typecheck

Tests:

npm run test

Environment Variables

Core:

  • AGENT_EYE_HEADLESS=true|false

  • AGENT_EYE_NAV_TIMEOUT_MS=30000

  • AGENT_EYE_MAX_CHARS=50000

  • AGENT_EYE_PROFILE_ROOT=~/.agent-eye/profiles

  • AGENT_EYE_MODEL=deep-seek|llama2|default (content-budget profile for browser_get_content, not Ollama model name)

  • AGENT_EYE_SKILL_STALE_FOCUSED_CHARS=400 (refresh skill if cached focused content is too small)

  • AGENT_EYE_SKILL_STALE_SELECTOR_MATCHES=1 (refresh skill if cached selectors barely match DOM)

  • AGENT_EYE_SKILL_STALE_FAILURE_THRESHOLD=2 (refresh skill after repeated selector failures)

Ollama:

Recommended with your local model:

  • Keep AGENT_EYE_MODEL=default

  • Set OLLAMA_MODEL to your installed model tag (for example gemma4:e4b-it-q4_K_M)

Note: OLLAMA_MODEL from .env is loaded automatically because src/index.ts imports dotenv/config before server startup.

MCP Tools

  • browser_navigate

  • browser_get_content

  • browser_search

  • browser_interact

  • browser_screenshot (with intelligent SPA hydration waits)

  • browser_evaluate_js

  • browser_analyze_page (with skill cache + stale-skill auto-refresh)

  • browser_extract_section (token-efficient targeted extraction)

  • browser_ocr_chunk (visual fallback when selector extraction is weak)

  • ai_orchestrate_workflow

AI Orchestration

ai_orchestrate_workflow accepts a high-level goal and lets the orchestrator decide next actions step-by-step. Built-in extraction uses 5-phase LLM system:

  • Phase 1: Chunked per-field extraction with confidence scoring

  • Phase 2: Global reconciliation pass for conflict resolution

  • Phase 3: Consensus voting across multiple runs for reliability

  • Phase 4: DOM fingerprint caching to skip redundant extraction

  • Phase 5: Field-specific confidence thresholds for domain-aware validation

See EXTRACTION_PHASES.md for detailed architecture.

Input:

  • goal (required)

  • context (optional)

  • maxSteps (optional)

  • model (optional; defaults to OLLAMA_MODEL then gemma4:e4b-it-q4_K_M)

Output:

  • success

  • steps_executed

  • history of chosen actions

  • result object

  • error (when present)

Use cases and integration guidance:

Session Persistence

Persistent profiles are stored under AGENT_EYE_PROFILE_ROOT.

  • Cookies are loaded when a persistent profile starts

  • Cookies are saved when session manager closes

  • Profile name allows separate session histories per workflow/site

Avoiding Bot Detection & Human Verification

Some websites employ aggressive bot detection (such as CAPTCHAs, Cloudflare walls, or login requirements). To prevent your agent from being blocked:

  1. Warm the session: Run the warming script to manually open a non-headless browser window:

    npm run warm-session
  2. Solve challenges manually: Interact with the page (e.g. solve the CAPTCHA, log in, or accept cookies) in the opened browser window. This saves the verified session cookies to your persistent profile.

  3. Run your agent: The agent will automatically reuse these saved cookies on subsequent runs to bypass the verification check. Note that cookies and sessions expire over time, so you may need to run npm run warm-session again if the agent starts encountering bot blocks.

Skill Management

Agent Eye features a project-local skill caching system that maps CSS selectors to page structures to avoid redundant LLM page-analysis queries.

1. How Skills Are Learned & Saved

  • When the agent visits a page and calls the browser_analyze_page tool, the system checks if a matching skill pattern already exists.

  • If no skill exists, the tool splits the page into chunks, summarizes them using the local LLM, and builds a structural PageMap (identifying main content vs noise selectors).

  • This PageMap is automatically saved as a JSON skill file under the local ./agent-skills/<domain>.json directory.

  • Consecutive page loads on the same domain increments the successCount of the skill.

  • Each pattern also tracks success/failure metadata so stale selectors can be detected and refreshed automatically.

2. Matching and URL Routing

  • Skill matching is entirely URL-driven (not prompt-driven).

  • When a page is analyzed, the system extracts the domain and matches specific path patterns using the first two directory segments (e.g., vnexpress.net/the-thao).

  • If no exact path-level pattern matches the current URL, the system falls back to the top-level domain pattern (vnexpress.net).

3. Managing Skills via CLI

You can inspect, refresh, or remove learned domain skills using the Skill CLI:

# List all learned domain skills
npx tsx examples/skill-cli.ts list

# View the full JSON pattern for a specific domain
npx tsx examples/skill-cli.ts show vnexpress.net

# Delete a skill (forces the agent to re-analyze page structure on next visit)
npx tsx examples/skill-cli.ts delete vnexpress.net

Alternatively, you can force the agent to overwrite an existing skill dynamically by calling browser_analyze_page with the "forceReanalyze": true argument.

4. Stale Skill Auto-Refresh

  • When a cached pattern returns very low focused content and poor selector matches, the system treats the skill as stale.

  • The stale pattern is marked with a failure increment, then browser_analyze_page immediately falls back to fresh analysis in the same call.

  • For general-agent loops, repeated low-quality browser_extract_section results trigger a deterministic escalation:

    1. Force one browser_analyze_page refresh

    2. If still failing, escalate to browser_ocr_chunk

Security & Quality

Code Quality:

  • Pre-commit git hooks (Husky) validate all tests before commit

  • Enable with npm run prepare

  • Hook lives at .husky/pre-commit and runs npm test

License:

  • AGPL-3.0: Free for non-commercial and personal use

  • Commercial use requires explicit permission

Extraction Safety:

  • LLM confidence scoring prevents low-reliability extractions

  • Field-specific thresholds apply per-field validation

  • Evidence tracking for debugging extraction failures

  • browser_evaluate_js restricted mode blocks dangerous API patterns unless explicitly unsafe

  • Optional domain security blocks navigation to disallowed domains

  • Memory guard blocks new navigations when critical threshold is reached

Integration Tests

Relevant integration coverage:

  • tests/integration/browser-tools.test.ts

  • tests/integration/huntrix-golden-e2e.test.ts

  • tests/integration/ai-orchestrator-live.test.ts

  • tests/integration/ai-orchestrate-mcp-client.test.ts

  • tests/integration/youtube-search.test.ts

The integration tests validate:

  • ai-orchestrator-live: High-level prompt execution with JSON extraction (requires Ollama; opt in via RUN_LIVE_AI_ORCHESTRATOR_TEST=true)

  • browser-tools: Screenshot waits for dynamic content (YouTube SPA hydration)

  • dom extraction: 5-phase system (chunking, reconciliation, consensus, caching, thresholds)

Client Integration & Production Simulation

For general browser automation goals, you should drive the browser step-by-step from your own orchestration service (as ai_orchestrate_workflow is optimized specifically for YouTube metadata extraction).

1. General Agent Loop Client Script

We provide a production-ready simulation client script in examples/general-agent-client.ts that demonstrates how to:

  • Connect to the agent-eyes MCP server over stdio transport.

  • Query a local Ollama model (e.g. gemma4:e4b-it-q4_K_M) using simplified HTML context and action history.

  • Run actions step-by-step using MCP tools.

2. Running Simulations

You can evaluate the system on any prompt goal using:

# Run with default goal (VNExpress news summary)
npx tsx examples/general-agent-client.ts

# Run with a custom prompt
npx tsx examples/general-agent-client.ts "search what is the best trend in github.com"

3. Capturing Debug Screenshots

For easier debugging, you can enable step-by-step browser screenshots by appending the --screenshot CLI option. This will capture viewports at each step using the browser_screenshot tool and write them to a local debug-screenshots/ directory:

npx tsx examples/general-agent-client.ts "search what is the current stock for VCB (HOSE vn)" --screenshot

3.1 Example: CNN News Summary (10 items)

Example command:

npx tsx examples/general-agent-client.ts "go to cnn.com and summary for me list 10 of news" --screenshot

Expected behavior:

  • Agent navigates to https://edition.cnn.com/

  • Runs browser_analyze_page and extracts structured items[] with source URLs/images

  • Returns exactly 10 news summaries with source links

πŸŽ‰ Goal Reached!

Based on the visible content of CNN.com from June 28, 2026, here are 10 news summaries visible on CNN.com:

1. Germany's new Nazi party databases are challenging decades-held sanitized family narratives
  Source: https://edition.cnn.com/world/germans-nazi-past-far-right-intl

2. Israel's military and tech industry race to counter Hezbollah's latest threat
  Source: https://edition.cnn.com/world/middleeast/israel-tech-hezbollah-drone-threat-intl

3. The US and Iran have a deal on paper. At sea, the Strait of Hormuz is 'chaotic'
  Source: https://edition.cnn.com/world/iran-us-hormuz-agreement-mou-intl

4. Live updates: Death toll climbs to over 1,400 in Venezuela quakes
  Source: https://edition.cnn.com/world/live-news/venezuela-earthquake-hnk

3.2 Example: YouTube Video Metadata (likes, views, release date)

Example command:

npx tsx examples/general-agent-client.ts "search Golden Official, to see that video has how many like, view and release date" --screenshot

Expected behavior:

  • Agent runs YouTube-focused search via browser_search with site="youtube"

  • Navigates to the selected watch URL

  • Uses browser_analyze_page on the watch page to extract metadata

  • Returns video title, views, release date, and likes (if visible)

Sample output excerpt:

πŸŽ‰ Goal Reached!

Based on the provided YouTube page content for "Golden" Official Lyric Video | KPop Demon Hunters | Sony Animation:

- Video Title: β€œGolden” Official Lyric Video | KPop Demon Hunters | Sony Animation
- Views: 1,531,178,811 views
- Release Date: Jun 23, 2025
- Likes: The explicit number of likes is not visible in the provided focused content.

3.3 Example: OpenAI vs Claude News + Benchmark Table

Example command:

npx tsx examples/general-agent-client.ts "list me 5 news about AI models from OpenAI and Claude, which one is better, I need some benkmark or even the table for easy compare features" --screenshot

Expected behavior:

  • Agent starts from Google search and gathers relevant comparison sources

  • Navigates into a detailed article and runs browser_analyze_page

  • Returns a structured comparison with benchmark values and an easy-to-read table

  • Includes a practical recommendation on β€œwhich one is better” by use case

Sample output excerpt:

πŸŽ‰ Goal Reached!

Comparative Overview (Benchmarks Table)

| Category | GPT-5.4 (OpenAI) | Claude Opus 4.6 (Anthropic) | Gemini 3.1 Pro | Grok 4 |
| :--- | :--- | :--- | :--- | :--- |
| Coding | Strong (74.9% SWE-bench) | Strong (74%+; powers Cursor) | Good (63.8%, 1M context) | Leader (75%) |
| Reasoning | 92.8% GPQA | 91.3% GPQA | Leader (94.3% GPQA) | Competitive |
| Writing | Good (Canvas editor) | Leader (128K output, natural prose) | Good (Docs integration) | Uncensored style |
| API Price (in/out per 1M) | $2.50/$15 | $15/$75 (Opus), $3/$15 (Sonnet) | $2/$12 | $2/$15 |

Summary on which is better:
- Claude Opus 4.6 is stronger for long-form writing and natural prose.
- OpenAI GPT-5.4 remains a strong all-rounder with broad ecosystem support.

3.4 Example: Company Information Discovery (Alphaus Cloud)

Example command:

npx tsx examples/general-agent-client.ts "help me to find information about Alphaus Cloud company" --screenshot

Expected behavior:

  • Agent starts with browser_search on Google to gather initial company sources

  • Navigates to the official website (https://www.alphaus.cloud/en)

  • Runs browser_analyze_page and refreshes stale skill selectors when needed

  • Returns a structured company profile: mission, credentials, products, and services

Sample output excerpt:

πŸŽ‰ Goal Reached!

Based on the Alphaus Cloud landing page, here is detailed information about the company:

Company overview and mission:
- Alphaus Cloud focuses on cloud cost intelligence and FinOps-driven optimization.

Key credentials:
- Claims No. 1 position in Japan with over $200M+ cloud resources managed annually.
- Preferred FinOps solution by 25% of AWS Premier Partners in Japan.
- Reports 3000+ monthly active users on Ripple.

Core products and services:
- Octo: cost visibility and optimization for cloud users.
- Ripple: billing automation and reseller-focused cloud cost management.
- Professional Services: FinOps implementation and cross-team enablement.

3.5 Example: Localized Product + Price Discovery (Lynk & Co 08 in HCMC)

Example command:

npx tsx examples/general-agent-client.ts "find information about Lynk & co 08 in Vietnam? and current price for that car in Ho Chi Minh?" --screenshot

Expected behavior:

  • Agent runs localized search with Vietnamese keywords via browser_search

  • Uses focused content to extract product profile and local on-road pricing

  • Returns model details and city-specific price estimates for Ho Chi Minh City

Sample output excerpt:

πŸŽ‰ Goal Reached!

General information on Lynk & Co 08:
- D-segment SUV, PHEV (1.5 Turbo + electric motor)
- EV range up to 200 km, total range over 1,400 km

Current price in Ho Chi Minh City (HCMC):
- Lynk & Co 08 Pro:
  - Factory list price: 1,299,000,000 VND
  - Estimated on-road price (HCMC): 1.43-1.45 billion VND
- Lynk & Co 08 Halo:
  - Factory list price: 1,389,000,000 VND
  - Estimated on-road price (HCMC): 1.53-1.55 billion VND

4. Integration Code Pattern

To call this MCP server from your own Node.js service, use the following code pattern:

import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { StdioClientTransport } from '@modelcontextprotocol/sdk/client/stdio.js';

const transport = new StdioClientTransport({
  command: 'npx',
  args: ['tsx', 'src/index.ts'],
  cwd: '/path/to/agent-eyes',
});

const client = new Client({ name: 'my-orchestrator', version: '1.0.0' }, { capabilities: {} });

await client.connect(transport);

// 1. Navigate
await client.callTool({
  name: 'browser_navigate',
  arguments: { url: 'https://vnexpress.net' },
});

// 2. Get simplified page content for LLM context
const contentResult = await client.callTool({
  name: 'browser_get_content',
  arguments: { format: 'simplified-html', includeInteractiveMap: true },
});
const textContent = (contentResult as any).content?.find((c: any) => c.type === 'text')?.text;
const payload = JSON.parse(textContent);
console.log(payload.content); // Simplified HTML content
console.log(payload.interactiveMap); // Actionable ID mapping

// 3. Interact (e.g. click element with data-agent-id="34")
await client.callTool({
  name: 'browser_interact',
  arguments: { action: 'click', target: '34' },
});

// 4. Capture screenshot for debugging
const screenshotResult = await client.callTool({
  name: 'browser_screenshot',
  arguments: { fullPage: false },
});
const imageItem = (screenshotResult as any).content?.find((c: any) => c.type === 'image');
if (imageItem?.data) {
  // imageItem.data contains raw base64 PNG data
  fs.writeFileSync('debug-step.png', Buffer.from(imageItem.data, 'base64'));
}

await client.close();
A
license - permissive license
-
quality - not tested
B
maintenance

Maintenance

–Maintainers
–Response time
–Release cycle
–Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/henry0hai/agent-eyes'

If you have feedback or need assistance with the MCP directory API, please join our Discord server