What can you do with this server?

This server enables a local LLM agent to read and interact with web pages through a persistent browser session, combining URL-first reading tools with tab lifecycle management and action tools. URL-First Reading Tools (one-shot, no persistent tab needed) * fetch_snippet — Quickly fetch and preview the top portion of a webpage. * fetch_urls — Extract all links as {text, url} pairs (deduplicated, absolute URLs). * fetch_structure — Retrieve a page's heading outline to understand its organization. * extract — Extract specific named fields as structured JSON, guided by a JSON Schema. * summarize — Generate a prose summary of a webpage (supports a focus question). Persistent Tab Lifecycle Tools (for interaction) * open_tab — Open a URL in a persistent browser tab; returns a tab_id handle. * read_tab — Read the current tab state in three modes: snippet (with [eN] element refs), urls, or structure. * summarize_tab — Summarize the live current page of an open tab. * extract_tab — Extract structured JSON from the live current page using a JSON Schema. * close_tab — Close a specific persistent tab. * close_all_tabs — Close all open persistent tabs at once. Action Tools (interact with elements in a persistent tab) * click — Click a link, button, or other element by its [eN] ref. * type_into — Type text into a form field by its [eN] ref; optionally submit immediately. * press_key — Simulate a key press (e.g., Enter, Tab, Escape) in the active tab. * select_option — Choose an option from a dropdown by its ref and option label/value. Note: Element refs ([eN]) are renumbered after every action, so read_tab must be called again before the next action. Persistent tabs are essential for pages not re-fetchable by URL (e.g., POST form results).

Which integrations are available for this server?

Provides web search functionality via SearXNG, allowing the agent to search the web and retrieve results.

How do I use camofox-browser MCP server (stage2)?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@camofox-browser MCP server (stage2) search for 'what is MCP' on Wikipedia and read the introduction" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

camofox-browser MCP server (stage2)

by jfjensen

Overview Schema Related Servers Score Discussions

Python

Local

Local LLM agent that reads AND acts on web pages (camofox-browser + MCP)

Code for Part 6 of the Build Your Own Claude Code series: Acting on Web Pages with a Small Local LLM — Click, Type, Submit.

The series so far:

Part 1: the agent (CLI, tools, skills, history, compaction)
Part 2: the browser UI (FastAPI + WebSockets)
Part 3: web search (SearXNG via MCP)
Part 4: web browsing (camofox-browser via MCP) + composing with Part 3's search
Part 5: reading whole pages without truncation — chunked extract and summarize, plus a family of small single-purpose reader tools (fetch_snippet, fetch_urls, fetch_structure)
Part 6 (this repo): interaction — a persistent-tab lifecycle (open_tab / read_tab / close_tab) plus action tools (click, type_into, press_key, select_option, or a single interact tool, depending on config) so the agent can fill in a search box, click Next on a paginated list, or submit a form, not just read what is already on the page.

The problem this part solves: Part 5's reader tools are URL-first — each one opens a fresh tab, snapshots it, and closes it, so there is nothing to act on. Real tasks (search a site, page through results, fill out and submit a form) need a tab that stays open across calls and element references that survive long enough to click or type into. camofox keeps a tab alive until told to close it; this part adds the MCP plumbing on top — open a persistent tab, read its current state to find an element's [eN] ref, act on that ref, then re-read before the next action because every action renumbers the refs.

Install

git clone https://github.com/jfjensen/local-LLM-agent-mcp-interaction.git
cd local-LLM-agent-mcp-interaction
python -m venv .venv
# Linux / macOS:
source .venv/bin/activate
# Windows PowerShell:
.\.venv\Scripts\Activate.ps1
pip install -e .

You will also need:

Docker to run camofox-browser and SearXNG (Stage 1 brings up both).
Ollama with a tool-capable model. The default is qwen3.5:9b:
```
ollama pull qwen3.5:9b
```

SearXNG comes pre-configured (the settings file is mounted into the container), so the JSON API works without editing anything.

Related MCP server: Cloudflare Playwright MCP

Configuration

All tunable settings live in config.toml at the repo root:

model name, temperature, thinking;
SearXNG and camofox URLs;
chunking and reading budgets (the page snapshot size is capped by camofox itself, not here — see the note in config.toml's [browser]);
[acting] — the new section for Part 6: action_tool_style picks between four intent-named action tools (click, type_into, press_key, select_option) and a single interact(action=...) tool; wait_timeout_ms / wait_for_network control how long the server waits for a page to settle after an action; action_result_snippet optionally attaches a head-snippet of the post-action page to the action's result;
log verbosity (logging.level = "DEBUG" shows each tool call, its arguments, and a preview of what came back).

To see what is currently active:

mcp-config-show

The loader looks for config.toml in the current working directory first, then falls back to the one bundled with the repo.

How to run

Stage	What it is	How to run
1	Standing up camofox-browser and SearXNG via Docker	`cd stage1 && docker compose up -d` (see `stage1/README.md` for the camofox image build step)
2	The full agent: search plus the browser reader and action tools	`mcp-agent-stage2`

The browser MCP server (mcp-browser-stage2) exposes three groups of tools.

URL-first readers (from Part 5; each opens a one-shot tab, reads it, closes it):

fetch_snippet — the head of a page, for a quick look.
fetch_urls — the page's links as {text, url} pairs, absolute.
fetch_structure — the heading outline.
extract — named fields as JSON, via chunk-and-merge over the whole page (no truncation).
summarize — a prose summary, built by combining per-chunk summaries (map-reduce by default, or refine).

Persistent-tab lifecycle and readers (new in Part 6; for when you need to act on a page, not just read it):

open_tab — opens a page in a tab that stays open and returns a tab_id.
read_tab — reads the CURRENT state of an open tab: mode="snippet" (default, shows the [eN] element refs you need for an action), "urls", or "structure".
summarize_tab / extract_tab — the same whole-page, no-truncation summarize / extract logic, applied to the live tab instead of a fresh URL fetch (use these after navigating or acting your way to a page that may not be re-fetchable by URL, such as a form's POST target).
close_tab — closes one tab; close_all_tabs — tears down every tab in the session (the agent calls this automatically on shutdown).

Action tools (new in Part 6; act on an [eN] ref from the latest read_tab). The surface shown to the model is picked by config.toml's [acting].action_tool_style:

"separate" (default) — four intent-named tools: click, type_into (optionally submit=True to press Enter afterwards), press_key, select_option (for native <select> dropdowns).
"interact" — one interact(tab_id, action, ...) tool with an action enum (click / type / press / select), mirroring camofox's own /act dispatcher.

Every action re-snapshots the tab and reports the resulting url and a fresh refsCount: refs are renumbered after every action, so the rule is always read_tab → act → read_tab again before the next action.

Plus the search server (mcp-search-part3), a copy of Part 3's SearXNG MCP server, so the repo is self-contained.

Probing MCP servers with `inspect_any.py`

inspect_any.py makes one tool call per process, which is fine for the URL-first readers but not for interaction (each process would get its own tab). For the lifecycle/action tools, use test_act_flow.py instead (see below).

# List a server's tools:
python inspect_any.py mcp_browser_02.main

# Call a reader tool:
python inspect_any.py mcp_browser_02.main fetch_snippet --kv url=https://example.com
python inspect_any.py mcp_browser_02.main fetch_urls --kv url=https://example.com

# Call extract with a JSON Schema (use --args or --args-file for nested args):
python inspect_any.py mcp_browser_02.main extract --args-file extract_args.json

Where extract_args.json might look like:

{
  "url": "https://en.wikipedia.org/wiki/Vleteren",
  "schema": {
    "type": "object",
    "properties": {
      "mayor": {"type": "string", "description": "The current mayor, from the infobox"},
      "postal_code": {"type": "string", "description": "The postal code"},
      "population": {"type": "string", "description": "The total population"}
    }
  }
}

Exercising the action tools with `test_act_flow.py`

Unlike inspect_any.py, this opens a single MCP stdio session and runs the whole interaction loop in one process, so the persistent tab stays warm across calls:

python test_act_flow.py
# or against a different target:
python test_act_flow.py https://en.wikipedia.org/wiki/Main_Page

It runs open_tab → read_tab → type_into(..., submit=True) → read_tab → a deliberate reuse of the now-stale ref (to demonstrate why refs must be re-read after every action) → close_tab. The default target is DuckDuckGo Lite, a tiny HTML page whose search box makes a clean before/after-submit comparison.

Probing camofox directly with `probe_camofox.py`

A lower-level probe that talks to camofox's HTTP API directly (no MCP layer), useful when something in the action tools misbehaves and you need to rule out the MCP server as the cause. It first checks which "open a tab" endpoint your camofox build actually answers to, then drives open → snapshot → type → submit → snapshot → a stale-ref check → close against a real public form:

python probe_camofox.py
# or against a different form:
python probe_camofox.py https://html.duckduckgo.com/html/

Notes

The agent creates a history/ folder in the current working directory on first run.
The repo ships a copy of Part 3's SearXNG MCP server as mcp-search-part3, byte-for-byte the same, so you do not need to install Part 3.
The agent calls close_all_tabs automatically on shutdown, since the model does not reliably close tabs itself; persistent tabs would otherwise accumulate in camofox across sessions.

License

Troubleshooting

The Docker build fails with "dist not found". Use Dockerfile.ci instead of the default Dockerfile. See stage1/README.md.
FileNotFoundError: [WinError 2] when the agent spawns an MCP server. Your venv is not activated. Activate it so console scripts are on PATH.
Script exits silently on Windows. The default asyncio event loop on Windows cannot spawn subprocesses. The agent sets WindowsProactorEventLoopPolicy at startup; do the same if you copy the code elsewhere.
extract returns nulls on a page you know has the data. With the chunked extract this should be rare, but a very large page makes many model calls. Lower chunking.chunk_chars for smaller, more numerous chunks, or raise it for fewer, larger ones.
fetch_structure returns few or no headings. Some pages (short stubs, pages whose content lives in tables or infoboxes) have a thin heading outline. Use summarize or extract for those.
camofox returns a small or empty snapshot. Some pages need more than the default 1.5-second settle. Bump browser.settle_seconds.
An action fails with a stale-ref or "page may have changed" error. Every action renumbers the tab's [eN] refs. Call read_tab again to get fresh refs before retrying; never reuse a ref from before an action.
An action on a result page seems to silently target the wrong element, or the page doesn't look like what you expected. Don't re-fetch the result by URL — use read_tab / summarize_tab / extract_tab on the tab instead. Some result pages (a form's POST target, for example) only exist inside the tab and are not independently fetchable.
select_option fails even though the dropdown is visible. camofox only exposes a ref for <select>/combobox elements that its accessibility snapshot assigns one to; if read_tab shows no [eN] ref on the dropdown, it cannot be selected on that build. Use click on the visible options instead if the site renders them as a custom (non-<select>) widget.
probe_camofox.py can't find an open endpoint. It tries the canonical POST /tabs route first, falling back to the older POST /tabs/open shim. If neither returns a tabId, your camofox build's API has likely changed; paste the script's output to see which step failed.

Install Server

license - permissive license

quality

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Tools

View all tools

Latest Blog Posts

Who's Calling? MCP Hosts Are an Identity Blind Spot (And the Spec Knows It)
By Om-Shree-0709 on July 25, 2026.
mcp
Agent Identity
OAuth 2.1
Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jfjensen/local-LLM-agent-mcp-interaction'

If you have feedback or need assistance with the MCP directory API, please join our Discord server