camofox-browser MCP server (stage2)
Provides web search functionality via SearXNG, allowing the agent to search the web and retrieve results.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@camofox-browser MCP server (stage2)search for 'what is MCP' on Wikipedia and read the introduction"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Local LLM agent that reads AND acts on web pages (camofox-browser + MCP)
Code for Part 6 of the Build Your Own Claude Code series: Acting on Web Pages with a Small Local LLM — Click, Type, Submit.
The series so far:
Part 1: the agent (CLI, tools, skills, history, compaction)
Part 2: the browser UI (FastAPI + WebSockets)
Part 3: web search (SearXNG via MCP)
Part 4: web browsing (camofox-browser via MCP) + composing with Part 3's search
Part 5: reading whole pages without truncation — chunked
extractandsummarize, plus a family of small single-purpose reader tools (fetch_snippet,fetch_urls,fetch_structure)Part 6 (this repo): interaction — a persistent-tab lifecycle (
open_tab/read_tab/close_tab) plus action tools (click,type_into,press_key,select_option, or a singleinteracttool, depending on config) so the agent can fill in a search box, click Next on a paginated list, or submit a form, not just read what is already on the page.
The problem this part solves: Part 5's reader tools are URL-first —
each one opens a fresh tab, snapshots it, and closes it, so there is
nothing to act on. Real tasks (search a site, page through results,
fill out and submit a form) need a tab that stays open across calls
and element references that survive long enough to click or type into.
camofox keeps a tab alive until told to close it; this part adds the
MCP plumbing on top — open a persistent tab, read its current state to
find an element's [eN] ref, act on that ref, then re-read before the
next action because every action renumbers the refs.
Install
git clone https://github.com/jfjensen/local-LLM-agent-mcp-interaction.git
cd local-LLM-agent-mcp-interaction
python -m venv .venv
# Linux / macOS:
source .venv/bin/activate
# Windows PowerShell:
.\.venv\Scripts\Activate.ps1
pip install -e .You will also need:
Docker to run camofox-browser and SearXNG (Stage 1 brings up both).
Ollama with a tool-capable model. The default is
qwen3.5:9b:ollama pull qwen3.5:9b
SearXNG comes pre-configured (the settings file is mounted into the container), so the JSON API works without editing anything.
Related MCP server: Cloudflare Playwright MCP
Configuration
All tunable settings live in config.toml at the repo root:
model name, temperature, thinking;
SearXNG and camofox URLs;
snapshot, chunking, and reading budgets;
[acting]— the new section for Part 6:action_tool_stylepicks between three intent-named action tools (click,type_into,press_key,select_option) and a singleinteract(action=...)tool;wait_timeout_ms/wait_for_networkcontrol how long the server waits for a page to settle after an action;action_result_snippetoptionally attaches a head-snippet of the post-action page to the action's result;log verbosity (
logging.level = "DEBUG"shows each tool call, its arguments, and a preview of what came back).
To see what is currently active:
mcp-config-showThe loader looks for config.toml in the current working directory
first, then falls back to the one bundled with the repo.
How to run
Stage | What it is | How to run |
1 | Standing up camofox-browser and SearXNG via Docker |
|
2 | The full agent: search plus the browser reader and action tools |
|
The browser MCP server (mcp-browser-stage2) exposes three groups of tools.
URL-first readers (from Part 5; each opens a one-shot tab, reads it, closes it):
fetch_snippet— the head of a page, for a quick look.fetch_urls— the page's links as{text, url}pairs, absolute.fetch_structure— the heading outline.extract— named fields as JSON, via chunk-and-merge over the whole page (no truncation).summarize— a prose summary, built by combining per-chunk summaries (map-reduce by default, or refine).
Persistent-tab lifecycle and readers (new in Part 6; for when you need to act on a page, not just read it):
open_tab— opens a page in a tab that stays open and returns atab_id.read_tab— reads the CURRENT state of an open tab:mode="snippet"(default, shows the[eN]element refs you need for an action),"urls", or"structure".summarize_tab/extract_tab— the same whole-page, no-truncationsummarize/extractlogic, applied to the live tab instead of a fresh URL fetch (use these after navigating or acting your way to a page that may not be re-fetchable by URL, such as a form's POST target).close_tab— closes one tab;close_all_tabs— tears down every tab in the session (the agent calls this automatically on shutdown).
Action tools (new in Part 6; act on an [eN] ref from the latest
read_tab). The surface shown to the model is picked by
config.toml's [acting].action_tool_style:
"separate"(default) — four intent-named tools:click,type_into(optionallysubmit=Trueto press Enter afterwards),press_key,select_option(for native<select>dropdowns)."interact"— oneinteract(tab_id, action, ...)tool with an action enum (click/type/press/select), mirroring camofox's own/actdispatcher.
Every action re-snapshots the tab and reports the resulting url and a
fresh refsCount: refs are renumbered after every action, so the rule
is always read_tab → act → read_tab again before the next action.
Plus the search server (mcp-search-part3), a copy of Part 3's SearXNG
MCP server, so the repo is self-contained.
Probing MCP servers with inspect_any.py
inspect_any.py makes one tool call per process, which is fine for the
URL-first readers but not for interaction (each process would get its
own tab). For the lifecycle/action tools, use test_act_flow.py
instead (see below).
# List a server's tools:
python inspect_any.py mcp_browser_02.main
# Call a reader tool:
python inspect_any.py mcp_browser_02.main fetch_snippet --kv url=https://example.com
python inspect_any.py mcp_browser_02.main fetch_urls --kv url=https://example.com
# Call extract with a JSON Schema (use --args or --args-file for nested args):
python inspect_any.py mcp_browser_02.main extract --args-file extract_args.jsonWhere extract_args.json might look like:
{
"url": "https://en.wikipedia.org/wiki/Vleteren",
"schema": {
"type": "object",
"properties": {
"mayor": {"type": "string", "description": "The current mayor, from the infobox"},
"postal_code": {"type": "string", "description": "The postal code"},
"population": {"type": "string", "description": "The total population"}
}
}
}Exercising the action tools with test_act_flow.py
Unlike inspect_any.py, this opens a single MCP stdio session and runs
the whole interaction loop in one process, so the persistent tab stays
warm across calls:
python test_act_flow.py
# or against a different target:
python test_act_flow.py https://en.wikipedia.org/wiki/Main_PageIt runs open_tab → read_tab → type_into(..., submit=True) →
read_tab → a deliberate reuse of the now-stale ref (to demonstrate why
refs must be re-read after every action) → close_tab. The default
target is DuckDuckGo Lite, a tiny HTML page whose search box makes a
clean before/after-submit comparison.
Probing camofox directly with probe_camofox.py
A lower-level probe that talks to camofox's HTTP API directly (no MCP layer), useful when something in the action tools misbehaves and you need to rule out the MCP server as the cause. It first checks which "open a tab" endpoint your camofox build actually answers to, then drives open → snapshot → type → submit → snapshot → a stale-ref check → close against a real public form:
python probe_camofox.py
# or against a different form:
python probe_camofox.py https://html.duckduckgo.com/html/Notes
The agent creates a
history/folder in the current working directory on first run.The repo ships a copy of Part 3's SearXNG MCP server as
mcp-search-part3, byte-for-byte the same, so you do not need to install Part 3.The agent calls
close_all_tabsautomatically on shutdown, since the model does not reliably close tabs itself; persistent tabs would otherwise accumulate in camofox across sessions.
License
MIT © 2026 Jes Fink-Jensen. See LICENSE for details.
Troubleshooting
The Docker build fails with "dist not found". Use
Dockerfile.ciinstead of the defaultDockerfile. Seestage1/README.md.FileNotFoundError: [WinError 2]when the agent spawns an MCP server. Your venv is not activated. Activate it so console scripts are onPATH.Script exits silently on Windows. The default asyncio event loop on Windows cannot spawn subprocesses. The agent sets
WindowsProactorEventLoopPolicyat startup; do the same if you copy the code elsewhere.extractreturns nulls on a page you know has the data. With the chunked extract this should be rare, but a very large page makes many model calls. Lowerchunking.chunk_charsfor smaller, more numerous chunks, or raise it for fewer, larger ones.fetch_structurereturns few or no headings. Some pages (short stubs, pages whose content lives in tables or infoboxes) have a thin heading outline. Usesummarizeorextractfor those.camofox returns a small or empty snapshot. Some pages need more than the default 1.5-second settle. Bump
browser.settle_seconds.An action fails with a stale-ref or "page may have changed" error. Every action renumbers the tab's
[eN]refs. Callread_tabagain to get fresh refs before retrying; never reuse a ref from before an action.An action on a result page seems to silently target the wrong element, or the page doesn't look like what you expected. Don't re-fetch the result by URL — use
read_tab/summarize_tab/extract_tabon the tab instead. Some result pages (a form's POST target, for example) only exist inside the tab and are not independently fetchable.select_optionfails even though the dropdown is visible. camofox only exposes a ref for<select>/combobox elements that its accessibility snapshot assigns one to; ifread_tabshows no[eN]ref on the dropdown, it cannot be selected on that build. Useclickon the visible options instead if the site renders them as a custom (non-<select>) widget.probe_camofox.pycan't find an open endpoint. It tries the canonicalPOST /tabsroute first, falling back to the olderPOST /tabs/openshim. If neither returns atabId, your camofox build's API has likely changed; paste the script's output to see which step failed.
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/jfjensen/local-LLM-agent-mcp-interaction'
If you have feedback or need assistance with the MCP directory API, please join our Discord server