Skip to main content
Glama

WebSurfer MCP

by crybo-rybo

🌐 WebSurfer MCP

A powerful Model Context Protocol (MCP) server that enables Large Language Models (LLMs) to fetch and extract readable text content from web pages. This tool provides a secure, efficient, and feature-rich way for AI assistants to access web content through a standardized interface.

✨ Features

  • 🔒 Secure URL Validation: Blocks dangerous schemes, private IPs, and localhost domains

  • 📄 Smart Content Extraction: Extracts clean, readable text from HTML pages using advanced parsing

  • ⚡ Rate Limiting: Built-in rate limiting to prevent abuse (60 requests/minute)

  • 🛡️ Content Type Filtering: Only processes supported content types (HTML, plain text, XML)

  • 📏 Size Limits: Configurable content size limits (default: 10MB)

  • ⏱️ Timeout Management: Configurable request timeouts with validation

  • 🔧 Comprehensive Error Handling: Detailed error messages for various failure scenarios

  • 🧪 Full Test Coverage: 45+ unit tests covering all functionality

🏗️ Architecture

The project consists of several key components:

Core Components

  • MCPURLSearchServer: Main MCP server implementation

  • TextExtractor: Handles web content fetching and text extraction

  • URLValidator: Validates and sanitizes URLs for security

  • Config: Centralized configuration management

Key Features

  • Async/Await: Built with modern Python async patterns for high performance

  • Resource Management: Proper cleanup of network connections and resources

  • Context Managers: Safe resource handling with automatic cleanup

  • Logging: Comprehensive logging for debugging and monitoring

🚀 Installation

Prerequisites

  • Python 3.12 or higher

  • uv package manager (recommended)

Quick Start

  1. Clone the repository:

    git clone https://github.com/crybo-rybo/websurfer-mcp cd websurfer-mcp
  2. Install dependencies:

    uv sync
  3. Verify installation:

    uv run python -c "import mcp_url_search_server; print('Installation successful!')"

🎯 Usage

Starting the MCP Server

The server communicates via stdio (standard input/output) and can be integrated with any MCP-compatible client.

# Start the server uv run run_server.py serve # Start with custom log level uv run run_server.py serve --log-level DEBUG

Testing URL Search Functionality

Test the URL search functionality directly:

# Test with a simple URL uv run run_server.py test --url "https://example.com" # Test with custom timeout uv run run_server.py test --url "https://httpbin.org/html" --timeout 15

Example Test Output

{ "success": true, "url": "https://example.com", "title": "Example Domain", "content_type": "text/html", "status_code": 200, "text_length": 1250, "text_preview": "Example Domain This domain is for use in illustrative examples in documents..." }

🛠️ Configuration

The server can be configured using environment variables:

Variable

Default

Description

MCP_DEFAULT_TIMEOUT

10

Default request timeout in seconds

MCP_MAX_TIMEOUT

60

Maximum allowed timeout in seconds

MCP_USER_AGENT

MCP-URL-Search-Server/1.0.0

User agent string for requests

MCP_MAX_CONTENT_LENGTH

10485760

Maximum content size in bytes (10MB)

Example Configuration

export MCP_DEFAULT_TIMEOUT=15 export MCP_MAX_CONTENT_LENGTH=5242880 # 5MB uv run run_server.py serve

🧪 Testing

Running All Tests

# Run all tests with verbose output uv run python -m unittest discover tests -v # Run tests with coverage (if coverage is installed) uv run coverage run -m unittest discover tests uv run coverage report

Running Specific Test Files

# Run only integration tests uv run python -m unittest tests.test_integration -v # Run only text extraction tests uv run python -m unittest tests.test_text_extractor -v # Run only URL validation tests uv run python -m unittest tests.test_url_validator -v

Test Results

All 45 tests should pass successfully:

test_content_types_immutable (test_config.TestConfig.test_content_types_immutable) ... ok test_default_configuration_values (test_config.TestConfig.test_default_configuration_values) ... ok test_404_error_handling (test_integration.TestMCPURLSearchIntegration.test_404_error_handling) ... ok ... ---------------------------------------------------------------------- Ran 45 tests in 1.827s OK

🔧 Development

Project Structure

websurfer-mcp/ ├── mcp_url_search_server.py # Main MCP server implementation ├── text_extractor.py # Web content extraction logic ├── url_validator.py # URL validation and security ├── config.py # Configuration management ├── run_server.py # Command-line interface ├── run_tests.py # Test runner utilities ├── tests/ # Test suite │ ├── test_integration.py # Integration tests │ ├── test_text_extractor.py # Text extraction tests │ ├── test_url_validator.py # URL validation tests │ └── test_config.py # Configuration tests ├── pyproject.toml # Project configuration └── README.md # This file

🔒 Security Features

URL Validation

  • Scheme Blocking: Blocks file://, javascript:, ftp:// schemes

  • Private IP Protection: Blocks access to private IP ranges (10.x.x.x, 192.168.x.x, etc.)

  • Localhost Protection: Blocks localhost and local domain access

  • URL Length Limits: Prevents extremely long URLs

  • Format Validation: Ensures proper URL structure

Content Safety

  • Content Type Filtering: Only processes supported text-based content types

  • Size Limits: Configurable maximum content size (default: 10MB)

  • Rate Limiting: Prevents abuse with configurable limits

  • Timeout Protection: Configurable request timeouts

📊 Performance

  • Async Processing: Non-blocking I/O for high concurrency

  • Connection Pooling: Efficient HTTP connection reuse

  • DNS Caching: Reduces DNS lookup overhead

  • Resource Cleanup: Automatic cleanup prevents memory leaks

🙏 Acknowledgments


Happy web surfing with your AI assistant! 🚀

Deploy Server
A
security – no known vulnerabilities
F
license - not found
A
quality - confirmed to work

remote-capable server

The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.

A Model Context Protocol server that enables AI assistants to securely fetch and extract readable text content from web pages through a standardized interface.

  1. ✨ Features
    1. 🏗️ Architecture
      1. Core Components
      2. Key Features
    2. 🚀 Installation
      1. Prerequisites
      2. Quick Start
    3. 🎯 Usage
      1. Starting the MCP Server
      2. Testing URL Search Functionality
      3. Example Test Output
    4. 🛠️ Configuration
      1. Example Configuration
    5. 🧪 Testing
      1. Running All Tests
      2. Running Specific Test Files
      3. Test Results
    6. 🔧 Development
      1. Project Structure
    7. 🔒 Security Features
      1. URL Validation
      2. Content Safety
    8. 📊 Performance
      1. 🙏 Acknowledgments

        Related MCP Servers

        • -
          security
          F
          license
          -
          quality
          A server that enables AI systems to browse, retrieve content from, and interact with web pages through the Model Context Protocol.
          Last updated -
        • -
          security
          F
          license
          -
          quality
          A comprehensive Model Context Protocol server implementation that enables AI assistants to interact with file systems, databases, GitHub repositories, web resources, and system tools while maintaining security and control.
          Last updated -
          33
          1
        • A
          security
          F
          license
          A
          quality
          A Model Context Protocol server that enables AI assistants to perform real-time web searches, retrieving up-to-date information from the internet via a Crawler API.
          Last updated -
          1
          242
          17
          • Apple
          • Linux
        • A
          security
          A
          license
          A
          quality
          A Model Context Protocol server enabling AI assistants to scrape web content with high accuracy and flexibility, supporting multiple scraping modes and content formatting options.
          Last updated -
          4
          14
          2
          MIT License
          • Linux
          • Apple

        View all related MCP servers

        MCP directory API

        We provide all the information about MCP servers via our MCP API.

        curl -X GET 'https://glama.ai/api/mcp/v1/servers/crybo-rybo/websurfer-mcp'

        If you have feedback or need assistance with the MCP directory API, please join our Discord server