The WebSurfer MCP server enables AI assistants to securely fetch and extract clean, readable text from web pages through a standardized interface.
Fetch web content: Retrieve plain-text content from any publicly accessible HTTP/HTTPS URL using the
search_urltoolExtract clean text: Automatically remove boilerplate elements (navigation, headers, scripts) using trafilatura and BeautifulSoup4 to provide high-quality, readable content
Enhance security: Prevent SSRF attacks by blocking access to private IPs, loopback addresses, reserved IP ranges, and non-HTTP/HTTPS schemes
Manage resources: Enforce content size limits (default 10MB), configurable request timeouts (1-60 seconds, default 10 seconds), and built-in rate limiting
Handle errors gracefully: Receive detailed feedback for network issues, HTTP errors, and content parsing failures
Customize behavior: Configure timeout, user agent, and content limits via environment variables
Integrate seamlessly: Work with MCP-compliant clients like Claude Desktop to enable web browsing capabilities for AI assistants
Used for making async HTTP requests to web pages, enabling efficient fetching of web content
Blocks javascript: URL schemes as part of the security features to prevent potential security vulnerabilities
Built with modern Python async patterns for high performance, requiring Python 3.12 or higher to run
Supports processing XML content types, allowing extraction of text from XML-based web pages
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@WebSurfer MCPget the main content from https://news.ycombinator.com"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
WebSurfer is a Model Context Protocol (MCP) server designed to provide Large Language Models (LLMs) with secure and efficient access to web content.
Core Features
Advanced URL Validation: Implements strict security controls using the
ipaddressmodule to block private, loopback, link-local, and reserved destinations before any fetch occurs.Optimized Content Extraction: Utilizes
trafilaturaandBeautifulSoup4to extract high-quality, readable text from HTML, effectively removing boilerplate such as navigation, headers, and scripts.Resource Management: Enforces strict content size limits and request timeouts to ensure system stability and performance.
Redirect Safety: Validates every redirect hop and refuses redirects to blocked schemes, localhost, private IP literals, or unsafe DNS targets.
Rate Limiting: Built-in request throttling to prevent service abuse and manage resource consumption.
Robust Error Handling: Provides granular feedback for network issues, HTTP errors, and content parsing failures.
Related MCP server: MCP Toolkit
Project Layout
websurfer-mcp/
├── src/websurfer_mcp/
│ ├── cli.py
│ ├── config.py
│ ├── extractor.py
│ ├── networking.py
│ ├── server.py
│ └── url_validation.py
├── tests/
├── docs/images/
├── pyproject.toml
└── run_tests.pyKey runtime components:
WebSurferServer: MCP transport and tool registration.TextExtractor: asynchronous HTTP fetching and readable-text extraction.SafeResolver: DNS resolution guard that rejects private and reserved IP answers.URLValidator: URL normalization and SSRF-focused validation.Config: environment-driven runtime configuration.
Installation
Prerequisites
Python 3.12 or higher
uv package manager
Setup
Clone the repository:
git clone https://github.com/crybo-rybo/websurfer-mcp cd websurfer-mcpInstall runtime dependencies:
uv syncInstall development tooling:
uv sync --group dev
Usage
Server Execution
The server communicates via standard I/O (stdio) and is compatible with any MCP-compliant client.
Use either the console script or the package module:
uv run websurfer-mcp serve
uv run python -m websurfer_mcp serveManual Testing
You can verify the extraction functionality directly from the command line:
uv run websurfer-mcp test --url "https://example.com"Desktop Client Integration
Claude Desktop
To use WebSurfer MCP with Claude Desktop, add the following configuration to your claude_desktop_config.json file.
Path locations:
macOS:
~/Library/Application Support/Claude/claude_desktop_config.jsonWindows:
%APPDATA%\Claude\claude_desktop_config.json
Configuration:
Replace /path/to/websurfer-mcp with the absolute path to your cloned repository.
After updating the configuration, restart Claude Desktop to enable the search_url tool.
{
"mcpServers": {
"websurfer": {
"command": "uv",
"args": [
"--directory",
"/path/to/websurfer-mcp",
"run",
"python",
"-m",
"websurfer_mcp",
"serve"
]
}
}
}Configuration
The server can be configured using the following environment variables:
Variable | Default | Description |
|
| Default request timeout in seconds. |
|
| Maximum allowed timeout in seconds. |
|
| Maximum number of redirect hops to follow. |
|
| User-Agent string for outgoing requests. |
|
| Maximum content size in bytes (default 10MB). |
Development
Run the test suite:
uv run pytestRun quality checks:
uv run ruff check .
uv run ruff format .Run a focused module:
uv run python run_tests.py --module test_serverSecurity
WebSurfer MCP is designed with security as a primary concern. It explicitly blocks:
Private IP ranges (e.g., 10.0.0.0/8, 192.168.0.0/16)
Loopback addresses (e.g., 127.0.0.1, ::1)
Link-local and reserved addresses
Non-HTTP/HTTPS schemes (e.g., file://, ftp://, javascript:)
Redirect hops that resolve to blocked destinations
DNS answers that resolve public-looking hostnames to private or reserved IPs
Developed with the Model Context Protocol.
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.