Skip to main content
Glama
jfjensen

camofox-browser MCP server (stage2)

by jfjensen

Local LLM agent that reads AND acts on web pages (camofox-browser + MCP)

Code for Part 6 of the Build Your Own Claude Code series: Acting on Web Pages with a Small Local LLM — Click, Type, Submit.

The series so far:

  • Part 1: the agent (CLI, tools, skills, history, compaction)

  • Part 2: the browser UI (FastAPI + WebSockets)

  • Part 3: web search (SearXNG via MCP)

  • Part 4: web browsing (camofox-browser via MCP) + composing with Part 3's search

  • Part 5: reading whole pages without truncation — chunked extract and summarize, plus a family of small single-purpose reader tools (fetch_snippet, fetch_urls, fetch_structure)

  • Part 6 (this repo): interaction — a persistent-tab lifecycle (open_tab / read_tab / close_tab) plus action tools (click, type_into, press_key, select_option, or a single interact tool, depending on config) so the agent can fill in a search box, click Next on a paginated list, or submit a form, not just read what is already on the page.

The problem this part solves: Part 5's reader tools are URL-first — each one opens a fresh tab, snapshots it, and closes it, so there is nothing to act on. Real tasks (search a site, page through results, fill out and submit a form) need a tab that stays open across calls and element references that survive long enough to click or type into. camofox keeps a tab alive until told to close it; this part adds the MCP plumbing on top — open a persistent tab, read its current state to find an element's [eN] ref, act on that ref, then re-read before the next action because every action renumbers the refs.

Install

git clone https://github.com/jfjensen/local-LLM-agent-mcp-interaction.git
cd local-LLM-agent-mcp-interaction
python -m venv .venv
# Linux / macOS:
source .venv/bin/activate
# Windows PowerShell:
.\.venv\Scripts\Activate.ps1
pip install -e .

You will also need:

  • Docker to run camofox-browser and SearXNG (Stage 1 brings up both).

  • Ollama with a tool-capable model. The default is qwen3.5:9b:

    ollama pull qwen3.5:9b

SearXNG comes pre-configured (the settings file is mounted into the container), so the JSON API works without editing anything.

Related MCP server: Cloudflare Playwright MCP

Configuration

All tunable settings live in config.toml at the repo root:

  • model name, temperature, thinking;

  • SearXNG and camofox URLs;

  • snapshot, chunking, and reading budgets;

  • [acting] — the new section for Part 6: action_tool_style picks between three intent-named action tools (click, type_into, press_key, select_option) and a single interact(action=...) tool; wait_timeout_ms / wait_for_network control how long the server waits for a page to settle after an action; action_result_snippet optionally attaches a head-snippet of the post-action page to the action's result;

  • log verbosity (logging.level = "DEBUG" shows each tool call, its arguments, and a preview of what came back).

To see what is currently active:

mcp-config-show

The loader looks for config.toml in the current working directory first, then falls back to the one bundled with the repo.

How to run

Stage

What it is

How to run

1

Standing up camofox-browser and SearXNG via Docker

cd stage1 && docker compose up -d (see stage1/README.md for the camofox image build step)

2

The full agent: search plus the browser reader and action tools

mcp-agent-stage2

The browser MCP server (mcp-browser-stage2) exposes three groups of tools.

URL-first readers (from Part 5; each opens a one-shot tab, reads it, closes it):

  • fetch_snippet — the head of a page, for a quick look.

  • fetch_urls — the page's links as {text, url} pairs, absolute.

  • fetch_structure — the heading outline.

  • extract — named fields as JSON, via chunk-and-merge over the whole page (no truncation).

  • summarize — a prose summary, built by combining per-chunk summaries (map-reduce by default, or refine).

Persistent-tab lifecycle and readers (new in Part 6; for when you need to act on a page, not just read it):

  • open_tab — opens a page in a tab that stays open and returns a tab_id.

  • read_tab — reads the CURRENT state of an open tab: mode="snippet" (default, shows the [eN] element refs you need for an action), "urls", or "structure".

  • summarize_tab / extract_tab — the same whole-page, no-truncation summarize / extract logic, applied to the live tab instead of a fresh URL fetch (use these after navigating or acting your way to a page that may not be re-fetchable by URL, such as a form's POST target).

  • close_tab — closes one tab; close_all_tabs — tears down every tab in the session (the agent calls this automatically on shutdown).

Action tools (new in Part 6; act on an [eN] ref from the latest read_tab). The surface shown to the model is picked by config.toml's [acting].action_tool_style:

  • "separate" (default) — four intent-named tools: click, type_into (optionally submit=True to press Enter afterwards), press_key, select_option (for native <select> dropdowns).

  • "interact" — one interact(tab_id, action, ...) tool with an action enum (click / type / press / select), mirroring camofox's own /act dispatcher.

Every action re-snapshots the tab and reports the resulting url and a fresh refsCount: refs are renumbered after every action, so the rule is always read_tab → act → read_tab again before the next action.

Plus the search server (mcp-search-part3), a copy of Part 3's SearXNG MCP server, so the repo is self-contained.

Probing MCP servers with inspect_any.py

inspect_any.py makes one tool call per process, which is fine for the URL-first readers but not for interaction (each process would get its own tab). For the lifecycle/action tools, use test_act_flow.py instead (see below).

# List a server's tools:
python inspect_any.py mcp_browser_02.main

# Call a reader tool:
python inspect_any.py mcp_browser_02.main fetch_snippet --kv url=https://example.com
python inspect_any.py mcp_browser_02.main fetch_urls --kv url=https://example.com

# Call extract with a JSON Schema (use --args or --args-file for nested args):
python inspect_any.py mcp_browser_02.main extract --args-file extract_args.json

Where extract_args.json might look like:

{
  "url": "https://en.wikipedia.org/wiki/Vleteren",
  "schema": {
    "type": "object",
    "properties": {
      "mayor": {"type": "string", "description": "The current mayor, from the infobox"},
      "postal_code": {"type": "string", "description": "The postal code"},
      "population": {"type": "string", "description": "The total population"}
    }
  }
}

Exercising the action tools with test_act_flow.py

Unlike inspect_any.py, this opens a single MCP stdio session and runs the whole interaction loop in one process, so the persistent tab stays warm across calls:

python test_act_flow.py
# or against a different target:
python test_act_flow.py https://en.wikipedia.org/wiki/Main_Page

It runs open_tabread_tabtype_into(..., submit=True)read_tab → a deliberate reuse of the now-stale ref (to demonstrate why refs must be re-read after every action) → close_tab. The default target is DuckDuckGo Lite, a tiny HTML page whose search box makes a clean before/after-submit comparison.

Probing camofox directly with probe_camofox.py

A lower-level probe that talks to camofox's HTTP API directly (no MCP layer), useful when something in the action tools misbehaves and you need to rule out the MCP server as the cause. It first checks which "open a tab" endpoint your camofox build actually answers to, then drives open → snapshot → type → submit → snapshot → a stale-ref check → close against a real public form:

python probe_camofox.py
# or against a different form:
python probe_camofox.py https://html.duckduckgo.com/html/

Notes

  • The agent creates a history/ folder in the current working directory on first run.

  • The repo ships a copy of Part 3's SearXNG MCP server as mcp-search-part3, byte-for-byte the same, so you do not need to install Part 3.

  • The agent calls close_all_tabs automatically on shutdown, since the model does not reliably close tabs itself; persistent tabs would otherwise accumulate in camofox across sessions.

License

MIT © 2026 Jes Fink-Jensen. See LICENSE for details.

Troubleshooting

  • The Docker build fails with "dist not found". Use Dockerfile.ci instead of the default Dockerfile. See stage1/README.md.

  • FileNotFoundError: [WinError 2] when the agent spawns an MCP server. Your venv is not activated. Activate it so console scripts are on PATH.

  • Script exits silently on Windows. The default asyncio event loop on Windows cannot spawn subprocesses. The agent sets WindowsProactorEventLoopPolicy at startup; do the same if you copy the code elsewhere.

  • extract returns nulls on a page you know has the data. With the chunked extract this should be rare, but a very large page makes many model calls. Lower chunking.chunk_chars for smaller, more numerous chunks, or raise it for fewer, larger ones.

  • fetch_structure returns few or no headings. Some pages (short stubs, pages whose content lives in tables or infoboxes) have a thin heading outline. Use summarize or extract for those.

  • camofox returns a small or empty snapshot. Some pages need more than the default 1.5-second settle. Bump browser.settle_seconds.

  • An action fails with a stale-ref or "page may have changed" error. Every action renumbers the tab's [eN] refs. Call read_tab again to get fresh refs before retrying; never reuse a ref from before an action.

  • An action on a result page seems to silently target the wrong element, or the page doesn't look like what you expected. Don't re-fetch the result by URL — use read_tab / summarize_tab / extract_tab on the tab instead. Some result pages (a form's POST target, for example) only exist inside the tab and are not independently fetchable.

  • select_option fails even though the dropdown is visible. camofox only exposes a ref for <select>/combobox elements that its accessibility snapshot assigns one to; if read_tab shows no [eN] ref on the dropdown, it cannot be selected on that build. Use click on the visible options instead if the site renders them as a custom (non-<select>) widget.

  • probe_camofox.py can't find an open endpoint. It tries the canonical POST /tabs route first, falling back to the older POST /tabs/open shim. If neither returns a tabId, your camofox build's API has likely changed; paste the script's output to see which step failed.

Install Server
A
license - permissive license
A
quality
C
maintenance

Maintenance

Maintainers
Response time
Release cycle
Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jfjensen/local-LLM-agent-mcp-interaction'

If you have feedback or need assistance with the MCP directory API, please join our Discord server