MCP Web Scraper
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@MCP Web Scraperscrape the latest articles from https://example.com"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
MCP Web Scraper
A standalone Model Context Protocol (MCP) server that scrapes static HTML with BeautifulSoup. LLM analysis is handled by your MCP client (e.g. Odysseus with Ollama, or Open WebUI) — this server only provides scraping tools.
Designed for container deployment over Streamable HTTP.
What it does
Exposes two MCP tools:
Tool | Description |
| Fetch a URL and return title, text, optional CSS matches, and links |
| Parse existing HTML with a CSS selector |
Limitations: static HTML only (no JavaScript rendering), no anti-bot bypass. Respect site terms and rate limits.
Prerequisites
Docker
Quick start (GHCR image)
After your first GitHub Release, pull the published image:
docker pull ghcr.io/<owner>/mcp-web-scraper:latest
docker run -d \
--name mcp-web-scraper \
-p 127.0.0.1:8000:8000 \
ghcr.io/<owner>/mcp-web-scraper:latestVerify health:
curl http://127.0.0.1:8000/healthMCP endpoint: http://127.0.0.1:8000/mcp
Make the GHCR package public
Go to your GitHub profile → Packages →
mcp-web-scraperPackage settings → Change visibility → Public
Local build
cp .env.example .env
docker compose up --build -dConnect to Odysseus
Odysseus uses native Streamable HTTP MCP (transport http) via the Python SDK's streamablehttp_client. No MCPO bridge required.
Odysseus requirements this server meets
Requirement | How this server satisfies it |
Transport | Streamable HTTP at |
MCP URL | Use |
No OAuth | Server does not return |
Tool discovery |
|
Tool calls | Agent invokes |
Fast connect | Startup completes within Odysseus's 8s HTTP connect window |
Important: Leave MCP_API_KEY empty when using Odysseus. Odysseus does not send a Bearer token for HTTP MCP servers, and a 401 response would trigger its OAuth flow.
Run this MCP server (publish port
8000or attach to the same Docker network as Odysseus).In Odysseus Settings → MCP (admin), add a server:
Field | Value |
Name |
|
Transport |
|
URL |
|
Configure your LLM in Odysseus separately — the agent uses scraping tools from this server and its own model for analysis.
Compose overlay with Odysseus
Add to your Odysseus docker-compose.yml or an override file:
services:
mcp-web-scraper:
image: ghcr.io/<owner>/mcp-web-scraper:latest
ports:
- "127.0.0.1:8000:8000"
restart: unless-stopped
odysseus:
depends_on:
- mcp-web-scraperUse http://mcp-web-scraper:8000/mcp as the MCP URL inside Odysseus.
Shared network overlay (separate compose projects)
If Odysseus and this server run in different docker compose projects, attach this service to the Odysseus network:
# Confirm the Odysseus network name (usually odysseus_default)
docker network ls | rg odysseus
docker compose -f docker-compose.yml -f docker-compose.odysseus.yml up -dRegister URL: http://mcp-web-scraper:8000/mcp
Odysseus in Docker (most common)
Odysseus's backend resolves the MCP URL inside its container. The hostname mcp-web-scraper only works when both containers share a Docker network; otherwise you get Temporary failure in name resolution and Odysseus may return HTTP 500.
Quickest fix — MCP published on the host (MCP_BIND=0.0.0.0, the default):
Field | Value |
Transport |
|
URL |
|
Odysseus's docker-compose.yml already sets extra_hosts: host.docker.internal:host-gateway. Verify from the Odysseus container:
docker exec -it odysseus-odysseus-1 curl -sf http://host.docker.internal:8000/healthShared-network fix — use the service name instead of the host:
docker network ls | rg odysseus # e.g. odysseus_default
docker compose -f docker-compose.yml -f docker-compose.odysseus.yml up -dThen register http://mcp-web-scraper:8000/mcp.
Which MCP URL to use
Odysseus runs… | MCP runs… | URL in Odysseus |
In Docker (same compose/network) | In Docker (same network) |
|
In Docker (separate compose) | In Docker |
|
In Docker | On Docker host ( |
|
On host | In Docker ( |
|
On host | In Docker ( |
|
Do not use your host LAN IP unless MCP_BIND publishes port 8000 on 0.0.0.0. The default compose file uses 0.0.0.0; set MCP_BIND=127.0.0.1 only if you want host-local access.
Troubleshooting Odysseus POST /api/mcp/servers → 500
The browser error is reported by Odysseus (localhost:7000), not this MCP server. A failed MCP connection normally returns HTTP 200 with "status": "error" in the JSON body — a 500 means Odysseus raised an unhandled exception.
Read Odysseus logs while saving the server (replace the container name if different):
docker logs -f odysseus-odysseus-1 2>&1 | rg -i "mcp|error|traceback"Or, if Odysseus runs directly on the host, check the terminal where
app.pyis running.Test reachability from Odysseus (run inside the Odysseus container):
docker exec -it odysseus-odysseus-1 curl -sf http://mcp-web-scraper:8000/health # or, MCP on host: docker exec -it odysseus-odysseus-1 curl -sf http://host.docker.internal:8000/healthExpect
{"status":"ok"}. If this fails, fix networking before re-saving in the UI.Confirm this server is up on the host:
curl -sf http://127.0.0.1:8000/healthLeave
MCP_API_KEYempty for Odysseus (see.env.example). Odysseus does not send Bearer tokens.Use the exact path
/mcp— not/or/sse.Pull or rebuild the latest image after compatibility fixes:
docker compose pull && docker compose up -d --build
Common log messages:
Odysseus / client error | Fix |
| Shared Docker network missing — use |
| Wrong URL, or MCP published only on |
| Run Odysseus once so DB migrations apply, or upgrade Odysseus |
| Unset |
Connect to Open WebUI
Open WebUI v0.6.31+ supports MCP Streamable HTTP natively.
Admin Settings → External Tools → Add Server
Type: MCP (Streamable HTTP)
URL:
http://host.docker.internal:8000/mcpAuth: None (or Bearer if
MCP_API_KEYis set)Enable Function Calling: Native on your model
Configuration
Copy .env.example to .env:
Variable | Default | Description |
|
| Bind address |
|
| HTTP port |
|
| HTTP fetch timeout (seconds) |
|
| Max download size (2 MB) |
| (empty) | Optional Bearer token for |
|
| Origin allowlist; |
|
| HTTP User-Agent header |
Security
Do not expose this server unauthenticated on the public internet.
Set
MCP_API_KEYfor Open WebUI or other clients that support Bearer auth. Do not enable it for Odysseus.Scraping is unrestricted by default; only scrape sites you are permitted to access.
Creating a release
CI publishes to GHCR when a GitHub Release is created:
git tag v0.1.0
git push origin v0.1.0Create a release from tag v0.1.0 on GitHub. The release workflow pushes:
ghcr.io/<owner>/mcp-web-scraper:0.1.0ghcr.io/<owner>/mcp-web-scraper:0.1ghcr.io/<owner>/mcp-web-scraper:0ghcr.io/<owner>/mcp-web-scraper:latest
Development
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
ruff check src tests
pytest
mcp-web-scraperLicense
MIT
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/ReclaimerGold/mcp-web-scraper'
If you have feedback or need assistance with the MCP directory API, please join our Discord server