Skip to main content
Glama
ptrken01
by ptrken01

web-browser-mcp

Query-driven web search + page fetch for AI agents. The agent gives it a natural-language question, gets back ranked results with snippets (and optionally full page content) ready to cite.

A small MCP server with two tools that compose cleanly:

Tool

Takes

Returns

web_search(query, ...)

A search query in plain English

Ranked results from real search engines, with title, URL, snippet, and (optionally) full extracted page content

get_page_content(url, ...)

A URL the agent already knows

Cleaned main text of that page

Multi-engine fallback: Bing via headless Chromium → DuckDuckGo HTML via httpx. No API keys required.

This is the shape you want when the agent is answering a question like "what's the latest model context protocol spec?" or "find a good tutorial on asyncio". The agent doesn't need to know which engine to use, what URL to fetch, or how to render the SERP — it just calls web_search and gets back ready-to-inject context.

Install

git clone https://github.com/ptrken01/web-browser-mcp
cd web-browser-mcp
uv venv .venv --python 3.11
uv pip install --python .venv/bin/python -e ".[dev]"
.venv/bin/playwright install chromium

(If you have an existing venv from elsewhere, just uv pip install -e . in it and run playwright install chromium once. The .[dev] extra adds pytest, mypy, and ruff.)

Related MCP server: DuckDuckGo MCP Server

Run

stdio (Claude Desktop, Cursor)

.venv/bin/web-browser-mcp

In your client's MCP config:

{
  "mcpServers": {
    "web-search": {
      "command": "/absolute/path/to/web-browser-mcp/.venv/bin/web-browser-mcp"
    }
  }
}

streamable-http (llama-ui, Open WebUI, browser clients)

.venv/bin/web-browser-mcp --transport streamable-http
# Default: http://127.0.0.1:8766/mcp

In llama-ui's MCP server settings, add an HTTP transport pointing at http://127.0.0.1:8766/mcp. CORS is open for localhost / 127.0.0.1 by default.

Tools

web_search(query, limit=5, include_content=False, ...)

Give it a natural-language question or topic. Returns ranked results with title, URL, and snippet, ready to be cited in your answer.

{
  "name": "web_search",
  "arguments": {
    "query": "model context protocol specification",
    "limit": 5,
    "include_content": false
  }
}

Parameters:

Param

Type

Default

Notes

query

str

Required. Natural-language search query (1-2000 chars).

limit

int

5

Max results (1-10).

include_content

bool

False

When True, follows each result URL and extracts the main text of the page. Adds latency.

engine_order

list[str]

["bing", "duckduckgo"]

Override the engine priority. Subset of ["bing", "duckduckgo"].

timeout_s

float

15

Per-engine timeout in seconds.

Response:

{
  "query": "model context protocol specification",
  "engine": "duckduckgo",
  "count": 5,
  "results": [
    {
      "title": "Official site",
      "url": "https://modelcontextprotocol.io",
      "snippet": "Model Context Protocol",
      "engine": "duckduckgo"
    },
    {
      "title": "What is the Model Context Protocol (MCP)?",
      "url": "https://modelcontextprotocol.io/docs/getting-started/intro",
      "snippet": "MCP (Model Context Protocol) is an open-source standard for connecting AI applications to external systems...",
      "engine": "duckduckgo"
    }
  ],
  "duration_s": 1.234
}

When include_content=True, each result additionally has:

{
  "content": "Main text extracted from the page...",
  "content_chars": 3421
}

get_page_content(url, max_chars=10000)

For the "I have a specific URL and want its content" case.

{
  "name": "get_page_content",
  "arguments": {
    "url": "https://example.com/article",
    "max_chars": 5000
  }
}

Response:

{
  "url": "https://example.com/article",
  "final_url": "https://example.com/article",
  "title": "Example Article",
  "text": "Main content of the page...",
  "text_chars": 3421,
  "duration_s": 0.5
}

Engines and fallback

The tool tries engines in engine_order. The first one to return ≥1 result wins. If all engines fail, the response has an error field with a stable string code.

Engine

How

Strengths

Weaknesses

Bing

Playwright headless Chromium

Full SERP, rich snippets, related questions

Can hit captcha on shared IPs; slower

DuckDuckGo HTML

httpx (no JS)

Reliable, fast, no browser overhead

Sometimes rate-limited under heavy use

In practice, DDG HTML is the workhorse — it's the engine that succeeds most often in test runs. Bing is the upgrade path for richer SERP data when the agent has a fresh IP and the captcha doesn't trip.

Error handling

All errors come back as structured error fields, not exceptions:

Error code

When

invalid_query

Empty query or query too long (>2000 chars).

invalid_url

get_page_content got a non-http URL or URL with no host.

search_engine_error

The active engine returned an error. Try a different engine_order.

search_timeout

The active engine timed out.

browser_not_initialized

An engine that needs Playwright was called without a browser.

fetch_failed

get_page_content got a non-2xx response or connection error.

extraction_failed

get_page_content got a response but trafilatura couldn't extract text.

lifespan did not initialize

Server lifespan never ran — see the searxng-mcp-scraper pitfall. Should not happen with this server.

Configuration (env vars)

Env var

Default

Notes

HEADLESS

True

Set to False to see the browser.

BROWSER

chromium

chromium / firefox / webkit.

BROWSER_TIMEOUT_S

30.0

Default per-step timeout.

NAVIGATION_TIMEOUT_S

30.0

Default web_search engine timeout.

USER_AGENT

(Chrome UA)

Override if you get blocked.

LOG_LEVEL

INFO

DEBUG / INFO / WARNING / ERROR.

MCP_HOST

127.0.0.1

Bind host (streamable-http transport).

MCP_PORT

8766

Bind port (streamable-http transport).

MCP_CORS_ORIGINS

localhost,127.0.0.1

Comma-separated CORS allow-list.

Test

.venv/bin/python -m pytest

14 tests pass. The end-to-end test (tests/test_server_e2e.py) boots real uvicorn + real FastMCP + a real Playwright browser, performs the MCP initialize handshake, and calls web_search against real search engines. This is the test that catches lifespan-bug classes from searxng-mcp-scraper — unit tests that call tool functions directly with hand-built state would miss them.

How it fits with the rest of the toolkit

Tool

Source

When to use

web-browser-mcp.web_search

Bing / DDG (this repo)

Agent has a question, needs relevant web results.

web-browser-mcp.get_page_content

httpx + trafilatura

Agent has a specific URL, wants its content.

searxng-mcp-scraper.search

SearXNG metasearch

When you want to control the engines, categories, language.

searxng-mcp-scraper.fetch

trafilatura over HTTP

Fast static-HTML extraction.

searxng-mcp-scraper.scrape_blog

RSS + parallel fetch

Whole blog → one .md.

searxng-mcp-scraper.deep_scrape

scrape_blog + docs

Blog + linked PDFs / docs.

Use web_search first for general questions. Drop to get_page_content or searxng-mcp-scraper.fetch when you have a URL. Use scrape_blog for "read this whole blog" use cases.

License

MIT

A
license - permissive license
-
quality - not tested
C
maintenance

Maintenance

Maintainers
Response time
Release cycle
Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ptrken01/web-browser-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server