Which integrations are available for this server?

Provides AI orchestration and LLM-based DOM extraction using a local Ollama model. Optimized for extracting metadata from YouTube pages, with integration tests and specialized workflow.

How do I use agent-eyes?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@agent-eyes search for latest AI news and summarize the top article" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

agent-eyes

by henry0hai

Overview Schema Related Servers Score Discussions

TypeScript

Local

Agent Eye

Give AI agents eyes on the web. Agent Eye is an MCP server that wraps Playwright for local browser search, navigation, extraction, and interaction. It features intelligent LLM-based DOM extraction with multi-pass reconciliation, consensus voting, DOM caching, and field-specific confidence thresholds.

What Is Implemented

Browser navigation with persistent or incognito modes
Search, interact, screenshot, content extraction, JS evaluation tools
5-Phase LLM-based DOM extraction (chunking, reconciliation, consensus voting, caching, field thresholds)
Goal Planner preamble — sanitizes the user goal, extracts entities, selects a browsing strategy, and decomposes into atomic sub-tasks before the step loop starts
News-gathering hardening — quality gates block premature completion, URL-confidence filtering prefers real article links, and deterministic recovery (scroll + listing fallback) gathers missing items
AI workflow orchestration tool powered by Ollama
Session persistence with cookie save/load per profile
Domain allowlist/denylist checks
Memory pressure protection
.env support via dotenv bootstrap at startup
Pre-commit git hooks (Husky) for test validation
AGPL-3.0 license (free for non-commercial/personal use)

Related MCP server: LCBro

Requirements

Node >= 22.14.0
Playwright Chromium installed

Setup

Install dependencies

npm install

Install Chromium

npx playwright install chromium

Configure environment

cp .env.sample .env
# or: cp .env.example .env

Run

Development:

npm run dev

Build + start:

npm run build
npm run start

Type check:

npm run typecheck

Tests:

npm run test

Environment Variables

Core:

AGENT_EYE_HEADLESS=true|false
AGENT_EYE_NAV_TIMEOUT_MS=30000
AGENT_EYE_MAX_CHARS=50000
AGENT_EYE_PROFILE_ROOT=~/.agent-eye/profiles
AGENT_EYE_MODEL=deep-seek|llama2|default (content-budget profile for browser_get_content, not Ollama model name)
AGENT_EYE_SKILL_STALE_FOCUSED_CHARS=400 (refresh skill if cached focused content is too small)
AGENT_EYE_SKILL_STALE_SELECTOR_MATCHES=1 (refresh skill if cached selectors barely match DOM)
AGENT_EYE_SKILL_STALE_FAILURE_THRESHOLD=2 (refresh skill after repeated selector failures)

Ollama:

OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=gemma4:e4b-it-q4_K_M
OLLAMA_NUM_CTX=16384 (shared across all calls)
OLLAMA_KEEP_ALIVE=10m

Recommended with your local model:

Keep AGENT_EYE_MODEL=default
Set OLLAMA_MODEL to your installed model tag (for example gemma4:e4b-it-q4_K_M)
Keep OLLAMA_NUM_CTX stable for all calls to prevent duplicate model instances in memory
Use OLLAMA_KEEP_ALIVE=10m (or higher) for smoother multi-step and chunked-answer runs

Note: OLLAMA_MODEL from .env is loaded automatically because src/index.ts imports dotenv/config before server startup.

MCP Tools

browser_navigate
browser_get_content
browser_search
browser_interact
browser_screenshot (with intelligent SPA hydration waits)
browser_evaluate_js
browser_analyze_page (with skill cache + stale-skill auto-refresh)
browser_extract_section (token-efficient targeted extraction)
browser_extract_structured (LLM-based structured extraction with templates: showtime, news_article, product, search_result, generic)
browser_ocr_chunk (visual fallback when selector extraction is weak)
browser_get_network_logs
ai_orchestrate_workflow
ai_plan_goal — pre-flight goal planner: sanitize → extract entities → select strategy → decompose into sub-tasks

AI Orchestration

ai_orchestrate_workflow

Accepts a high-level goal and drives a deterministic YouTube-metadata extraction workflow. Built-in 5-phase LLM extraction:

Phase 1: Chunked per-field extraction with confidence scoring
Phase 2: Global reconciliation pass for conflict resolution
Phase 3: Consensus voting across multiple runs for reliability
Phase 4: DOM fingerprint caching to skip redundant extraction
Phase 5: Field-specific confidence thresholds for domain-aware validation

See EXTRACTION_PHASES.md for detailed architecture.

Input: goal, context, maxSteps, model
Output: success, steps_executed, history, result, error

Use cases: AI_ORCHESTRATION_USE_CASES.md

ai_plan_goal — Goal Planner

Runs before the browser step loop to reduce per-step LLM pressure on local 9B-class models. Makes three focused LLM calls:

Sanitize — correct spelling/grammar in the raw goal; extract entities (movies, cinemas, cities, dateRange)
Strategy — choose one of four strategies: movie-first, location-first, search-first, direct-navigate
Decompose — emit ordered SubTask[] with stepBudget, successCriteria, suggestedTools

Step budget scale:

Budget	Meaning
1	Compile / summarize data already in memory
2	Navigate to a known URL or single click
3	Navigate + analyze page structure
4	Interact with a dynamic widget + read AJAX result
5	Iterate over 3–5 pages or a complex widget
6	Iterate over 6–10 pages (e.g., 10 article visits)
7	Complex multi-step workflow with retries
8	Long iterative data collection (>10 pages)

Input: goal, model, allowPreflightSearch, replanCount, completedSubTaskIds
Output: GoalPlan — sanitizedGoal, entities, strategy, strategyRationale, subTasks[]

General Agent Hardening (News Workflows)

The example general agent loop (agents/general-agent-client.ts) now includes practical safeguards for news-list goals (for example, "summarize 10 news items"):

Completeness gate: blocks complete/advance_subtask when collected items are below an 80% threshold of requested count.
Smart URL quality filter: for news goals, extracted items are scored (high vs low article confidence). If enough high-confidence items exist, low-confidence hub/category URLs are excluded.
Deterministic recovery path: when shortfall is detected, the agent tries infinite scroll re-analysis first, then navigates to fallback listing URLs from learned skills/fixed paths.
Prompt smart-chunking for focused content: prompt context prefers semantic section boundaries over blind character slicing, improving downstream synthesis quality.
Post-run scoring: heuristic evaluator is the default production gate; optional LLM-judge exists for deeper semantic audits.

Important runtime behavior:

MCP tool timeouts in the general-agent loop are now handled as recoverable tool errors in the shared tool wrapper, so recovery flows do not crash the full session on a single timeout.

Session Persistence

Persistent profiles are stored under AGENT_EYE_PROFILE_ROOT.

Cookies are loaded when a persistent profile starts
Cookies are saved when session manager closes
Profile name allows separate session histories per workflow/site

Avoiding Bot Detection & Human Verification

Some websites employ aggressive bot detection (such as CAPTCHAs, Cloudflare walls, or login requirements). To prevent your agent from being blocked:

Warm the session: Run the warming script to manually open a non-headless browser window:
```
npm run warm-session
```
Solve challenges manually: Interact with the page (e.g. solve the CAPTCHA, log in, or accept cookies) in the opened browser window. This saves the verified session cookies to your persistent profile.
Run your agent: The agent will automatically reuse these saved cookies on subsequent runs to bypass the verification check. Note that cookies and sessions expire over time, so you may need to run npm run warm-session again if the agent starts encountering bot blocks.

Skill Management

Agent Eye features a project-local skill caching system that maps CSS selectors to page structures to avoid redundant LLM page-analysis queries.

1. How Skills Are Learned & Saved

When the agent visits a page and calls the browser_analyze_page tool, the system checks if a matching skill pattern already exists.
If no skill exists, the tool splits the page into chunks, summarizes them using the local LLM, and builds a structural PageMap (identifying main content vs noise selectors).
This PageMap is automatically saved as a JSON skill file under the local ./agent-skills/<domain>.json directory.
Consecutive page loads on the same domain increments the successCount of the skill.
Each pattern also tracks success/failure metadata so stale selectors can be detected and refreshed automatically.

2. Matching and URL Routing

Skill matching is entirely URL-driven (not prompt-driven).
When a page is analyzed, the system extracts the domain and matches specific path patterns using the first two directory segments (e.g., vnexpress.net/the-thao).
If no exact path-level pattern matches the current URL, the system falls back to the top-level domain pattern (vnexpress.net).

3. Managing Skills via CLI

You can inspect, refresh, or remove learned domain skills using the Skill CLI:

# List all learned domain skills
npx tsx agents/skill-cli.ts list

# View the full JSON pattern for a specific domain
npx tsx agents/skill-cli.ts show vnexpress.net

# Delete a skill (forces the agent to re-analyze page structure on next visit)
npx tsx agents/skill-cli.ts delete vnexpress.net

Alternatively, you can force the agent to overwrite an existing skill dynamically by calling browser_analyze_page with the "forceReanalyze": true argument.

4. Stale Skill Auto-Refresh

When a cached pattern returns very low focused content and poor selector matches, the system treats the skill as stale.
The stale pattern is marked with a failure increment, then browser_analyze_page immediately falls back to fresh analysis in the same call.
For general-agent loops, repeated low-quality browser_extract_section results trigger a deterministic escalation:
1. Force one browser_analyze_page refresh
2. If still failing, escalate to browser_ocr_chunk

Security & Quality

Code Quality:

Pre-commit git hooks (Husky) validate all tests before commit
Enable with npm run prepare
Hook lives at .husky/pre-commit and runs npm test

License:

AGPL-3.0: Free for non-commercial and personal use
Commercial use requires explicit permission

Extraction Safety:

LLM confidence scoring prevents low-reliability extractions
Field-specific thresholds apply per-field validation
Evidence tracking for debugging extraction failures
browser_evaluate_js restricted mode blocks dangerous API patterns unless explicitly unsafe
Optional domain security blocks navigation to disallowed domains
Memory guard blocks new navigations when critical threshold is reached

Tests

Unit & Integration Tests

npm run test

Relevant test files:

tests/ai/goal-planner.test.ts — GoalPlanner unit tests (12 tests, all mocked, no Ollama required)
tests/ai/skill-manager-matching.test.ts — URL-aware skill matching
tests/ai/page-analyzer-dom.test.ts — DOM candidate selector extraction
tests/ai/structured-items.test.ts — structured item extraction with lazy-load images
tests/dom/llm-dom-extractor-phases.test.ts — 5-phase DOM extraction (16 tests)
tests/integration/browser-tools.test.ts — screenshot SPA hydration
tests/integration/huntrix-golden-e2e.test.ts — YouTube metadata end-to-end
tests/integration/ai-orchestrator-live.test.ts — live Ollama orchestration (opt in: RUN_LIVE_AI_ORCHESTRATOR_TEST=true)

Eval Tests (README 3.1–3.5)

End-to-end behavioral tests that spawn the full general-agent-client.ts loop and assert the output matches the README examples. Require Ollama + network access.

# Run all eval scenarios (CNN, YouTube, OpenAI/Claude, Alphaus, Lynk & Co)
RUN_EVAL_TESTS=true npx vitest run tests/eval/agent-eval.test.ts

# Run a single scenario
RUN_EVAL_TESTS=true npx vitest run tests/eval/agent-eval.test.ts -t "3.1"

# Use a different model
RUN_EVAL_TESTS=true OLLAMA_MODEL=llama3:8b npx vitest run tests/eval/agent-eval.test.ts

Covered scenarios:

Test	Goal	Pass criteria
3.1	CNN 10 news	≥10 numbered items, contains `cnn.com` URL
3.2	YouTube Golden metadata	Answer mentions views and/or release date
3.3	OpenAI vs Claude	Mentions OpenAI/GPT and Claude/Anthropic
3.4	Alphaus Cloud info	Contains company name + a product name
3.5	Lynk & Co 08 HCMC price	Contains car name + a VND/price value
Planner	VnExpress 10 news	Planner ran, ≥2 sub-tasks, answer >80 chars, no hallucinated selectors

Client Integration & Production Simulation

For general browser automation goals, you should drive the browser step-by-step from your own orchestration service (as ai_orchestrate_workflow is optimized specifically for YouTube metadata extraction).

1. General Agent Loop Client Script

We provide a production-ready simulation client script in agents/general-agent-client.ts that demonstrates how to:

Connect to the agent-eyes MCP server over stdio transport.
Run the GoalPlanner preamble once before the step loop: sanitize the goal, select a strategy, and emit ordered sub-tasks with step budgets.
Query a local Ollama model (e.g. gemma4:e4b-it-q4_K_M) per step — each step prompt shows only the active sub-task instead of the full raw goal, keeping the context small.
Advance sub-tasks via advance_subtask: true in the action JSON; re-plan automatically when a sub-task exceeds its budget.
Run actions step-by-step using MCP tools.

2. Running Simulations

You can evaluate the system on any prompt goal using:

# Run with default goal (VNExpress news summary)
npx tsx agents/general-agent-client.ts

# Run with a custom prompt
npx tsx agents/general-agent-client.ts "search what is the best trend in github.com"

3. Capturing Debug Screenshots

For easier debugging, you can enable step-by-step browser screenshots by appending the --screenshot CLI option. This will capture viewports at each step using the browser_screenshot tool and write them to a local debug-screenshots/ directory:

npx tsx agents/general-agent-client.ts "search what is the current stock for VCB (HOSE vn)" --screenshot

3.1 Example: CNN News Summary (10 items)

Example command:

npx tsx agents/general-agent-client.ts "go to cnn.com and summary for me list 10 of news" --screenshot

Expected behavior:

Agent navigates to https://edition.cnn.com/
Runs browser_analyze_page and extracts structured items[] with source URLs/images
Returns exactly 10 news summaries with source links

🎉 Goal Reached!

Based on the visible content of CNN.com from June 28, 2026, here are 10 news summaries visible on CNN.com:

1. Germany's new Nazi party databases are challenging decades-held sanitized family narratives
  Source: https://edition.cnn.com/world/germans-nazi-past-far-right-intl

2. Israel's military and tech industry race to counter Hezbollah's latest threat
  Source: https://edition.cnn.com/world/middleeast/israel-tech-hezbollah-drone-threat-intl

3. The US and Iran have a deal on paper. At sea, the Strait of Hormuz is 'chaotic'
  Source: https://edition.cnn.com/world/iran-us-hormuz-agreement-mou-intl

4. Live updates: Death toll climbs to over 1,400 in Venezuela quakes
  Source: https://edition.cnn.com/world/live-news/venezuela-earthquake-hnk

3.2 Example: YouTube Video Metadata (likes, views, release date)

Example command:

npx tsx agents/general-agent-client.ts "search Golden Official, to see that video has how many like, view and release date" --screenshot

Expected behavior:

Agent runs YouTube-focused search via browser_search with site="youtube"
Navigates to the selected watch URL
Uses browser_analyze_page on the watch page to extract metadata
Returns video title, views, release date, and likes (if visible)

Sample output excerpt:

🎉 Goal Reached!

Based on the provided YouTube page content for "Golden" Official Lyric Video | KPop Demon Hunters | Sony Animation:

- Video Title: “Golden” Official Lyric Video | KPop Demon Hunters | Sony Animation
- Views: 1,531,178,811 views
- Release Date: Jun 23, 2025
- Likes: The explicit number of likes is not visible in the provided focused content.

3.3 Example: OpenAI vs Claude News + Benchmark Table

Example command:

npx tsx agents/general-agent-client.ts "list me 5 news about AI models from OpenAI and Claude, which one is better, I need some benkmark or even the table for easy compare features" --screenshot

Expected behavior:

Agent starts from Google search and gathers relevant comparison sources
Navigates into a detailed article and runs browser_analyze_page
Returns a structured comparison with benchmark values and an easy-to-read table
Includes a practical recommendation on “which one is better” by use case

Sample output excerpt:

🎉 Goal Reached!

Comparative Overview (Benchmarks Table)

| Category | GPT-5.4 (OpenAI) | Claude Opus 4.6 (Anthropic) | Gemini 3.1 Pro | Grok 4 |
| :--- | :--- | :--- | :--- | :--- |
| Coding | Strong (74.9% SWE-bench) | Strong (74%+; powers Cursor) | Good (63.8%, 1M context) | Leader (75%) |
| Reasoning | 92.8% GPQA | 91.3% GPQA | Leader (94.3% GPQA) | Competitive |
| Writing | Good (Canvas editor) | Leader (128K output, natural prose) | Good (Docs integration) | Uncensored style |
| API Price (in/out per 1M) | $2.50/$15 | $15/$75 (Opus), $3/$15 (Sonnet) | $2/$12 | $2/$15 |

Summary on which is better:
- Claude Opus 4.6 is stronger for long-form writing and natural prose.
- OpenAI GPT-5.4 remains a strong all-rounder with broad ecosystem support.

3.4 Example: Company Information Discovery (Alphaus Cloud)

Example command:

npx tsx agents/general-agent-client.ts "help me to find information about Alphaus Cloud company" --screenshot

Expected behavior:

Agent starts with browser_search on Google to gather initial company sources
Navigates to the official website (https://www.alphaus.cloud/en)
Runs browser_analyze_page and refreshes stale skill selectors when needed
Returns a structured company profile: mission, credentials, products, and services

Sample output excerpt:

🎉 Goal Reached!

Based on the Alphaus Cloud landing page, here is detailed information about the company:

Company overview and mission:
- Alphaus Cloud focuses on cloud cost intelligence and FinOps-driven optimization.

Key credentials:
- Claims No. 1 position in Japan with over $200M+ cloud resources managed annually.
- Preferred FinOps solution by 25% of AWS Premier Partners in Japan.
- Reports 3000+ monthly active users on Ripple.

Core products and services:
- Octo: cost visibility and optimization for cloud users.
- Ripple: billing automation and reseller-focused cloud cost management.
- Professional Services: FinOps implementation and cross-team enablement.

3.5 Example: Localized Product + Price Discovery (Lynk & Co 08 in HCMC)

Example command:

npx tsx agents/general-agent-client.ts "find information about Lynk & co 08 in Vietnam? and current price for that car in Ho Chi Minh?" --screenshot

Expected behavior:

Agent runs localized search with Vietnamese keywords via browser_search
Uses focused content to extract product profile and local on-road pricing
Returns model details and city-specific price estimates for Ho Chi Minh City

Sample output excerpt:

🎉 Goal Reached!

General information on Lynk & Co 08:
- D-segment SUV, PHEV (1.5 Turbo + electric motor)
- EV range up to 200 km, total range over 1,400 km

Current price in Ho Chi Minh City (HCMC):
- Lynk & Co 08 Pro:
  - Factory list price: 1,299,000,000 VND
  - Estimated on-road price (HCMC): 1.43-1.45 billion VND
- Lynk & Co 08 Halo:
  - Factory list price: 1,389,000,000 VND
  - Estimated on-road price (HCMC): 1.53-1.55 billion VND

4. Integration Code Pattern

To call this MCP server from your own Node.js service, use the following code pattern:

import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { StdioClientTransport } from '@modelcontextprotocol/sdk/client/stdio.js';

const transport = new StdioClientTransport({
  command: 'npx',
  args: ['tsx', 'src/index.ts'],
  cwd: '/path/to/agent-eyes',
});

const client = new Client({ name: 'my-orchestrator', version: '1.0.0' }, { capabilities: {} });

await client.connect(transport);

// 1. Navigate
await client.callTool({
  name: 'browser_navigate',
  arguments: { url: 'https://vnexpress.net' },
});

// 2. Get simplified page content for LLM context
const contentResult = await client.callTool({
  name: 'browser_get_content',
  arguments: { format: 'simplified-html', includeInteractiveMap: true },
});
const textContent = (contentResult as any).content?.find((c: any) => c.type === 'text')?.text;
const payload = JSON.parse(textContent);
console.log(payload.content); // Simplified HTML content
console.log(payload.interactiveMap); // Actionable ID mapping

// 3. Interact (e.g. click element with data-agent-id="34")
await client.callTool({
  name: 'browser_interact',
  arguments: { action: 'click', target: '34' },
});

// 4. Capture screenshot for debugging
const screenshotResult = await client.callTool({
  name: 'browser_screenshot',
  arguments: { fullPage: false },
});
const imageItem = (screenshotResult as any).content?.find((c: any) => c.type === 'image');
if (imageItem?.data) {
  // imageItem.data contains raw base64 PNG data
  fs.writeFileSync('debug-step.png', Buffer.from(imageItem.data, 'base64'));
}

await client.close();

This server cannot be installed

license - permissive license

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/henry0hai/agent-eyes'

If you have feedback or need assistance with the MCP directory API, please join our Discord server