Skip to main content
Glama

Webpage MCP Server

by brian-bfz

Webpage MCP Server

A Model Context Protocol (MCP) server for querying webpages and page contents from a specific website.

Overview

This MCP server provides tools to list and retrieve webpage content by parsing a sitemap.xml file and fetching HTML content from specified URLs. It includes built-in rate limiting to protect against abuse.

Features

  • List Pages: Parse sitemap.xml to get all available webpage paths

  • Get Page Content: Fetch HTML content from any webpage

  • Sitemap Resource: Access the raw sitemap.xml file

  • Rate Limiting: 10 requests per minute per user to prevent abuse

Installation

# Install dependencies using uv uv sync # Or using pip pip install -e .

Configuration

The server uses environment variables for configuration:

Variable

Description

Default

BASE_URL

The base URL of the website to query

https://example.com

HOST

Server host address

0.0.0.0

PORT

Server port number

8080

Environment Setup

Create a .env.local file in the project root:

BASE_URL=https://your-website.com HOST=0.0.0.0 PORT=8080

Sitemap Configuration

Place your sitemap.xml file in the assets/ directory. The server will automatically read from:

assets/sitemap.xml

Usage

Running the Server

STDIO Mode (for MCP clients):

uv run python src/main.py --stdio

HTTP Mode:

uv run python src/main.py --port 8080 # Server will be available at http://localhost:8080/mcp

Test Mode:

uv run python src/main.py --test

Running Tests

uv run python tests/test_server.py

Available Tools

1. list_pages()

Lists all webpage paths from the sitemap.

Parameters: None

Returns: List of page paths

Example:

list_pages() # Returns: ["/", "/blog", "/blog/post-1", "/marketplace", "/pricing"]

2. get_page(path, user_id=None)

Fetches HTML content from a webpage.

Parameters:

  • path (str): The webpage path (e.g., "/blog/post-1")

  • user_id (str, optional): User identifier for rate limiting

Returns: Dictionary with HTML content and metadata

Example:

get_page("/blog/post-1") # Returns: # { # "path": "/blog/post-1", # "url": "https://example.com/blog/post-1", # "html": "<html>...</html>", # "status_code": 200, # "content_type": "text/html" # }

Rate Limit Response:

# When rate limit is exceeded: { "error": "Rate limit exceeded", "message": "Too many requests. Please wait 45 seconds before trying again.", "reset_in_seconds": 45, "limit": "10 requests per minute" }

Available Resources

sitemap://sitemap.xml

Access the raw sitemap.xml content.

Example:

<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>https://example.com/</loc> <lastmod>2025-10-08T20:10:25.773Z</lastmod> <changefreq>weekly</changefreq> <priority>1</priority> </url> ... </urlset>

Deployment

Deploy to Dedalus

dedalus deploy ./src/main.py --name 'webpage-server'

Use with Dedalus SDK

from dedalus import Dedalus client = Dedalus( mcp_servers=['your-org/webpage-server'] )

Rate Limiting

The server implements rate limiting to protect against abuse:

  • Limit: 10 requests per minute per user

  • Window: 60 seconds rolling window

  • Identifier: Uses user_id parameter or 'default' if not provided

When the rate limit is exceeded, the server returns an error response with:

  • Time until the limit resets

  • Total number of allowed requests

Architecture

mcp-server-example-python/ ├── src/ │ └── main.py # Main server implementation ├── assets/ │ └── sitemap.xml # Sitemap data source ├── tests/ │ └── test_server.py # Test suite ├── docs/ │ └── README.md # This documentation └── pyproject.toml # Project configuration

Error Handling

The server handles various error conditions gracefully:

  1. Missing Sitemap: Returns error if assets/sitemap.xml doesn't exist

  2. Invalid Path: Returns error for malformed paths

  3. Failed HTTP Request: Returns error with details when webpage fetch fails

  4. Rate Limit: Returns structured error with reset time

Development

Adding New Tools

To add a new tool, use the @mcp.tool() decorator:

@mcp.tool() def your_tool_name(param: str) -> dict: """Tool description""" # Implementation return {"result": "value"}

Adding New Resources

To add a new resource, use the @mcp.resource() decorator:

@mcp.resource('resource://resource-name') def get_resource() -> str: """Resource description""" return "resource content"

License

MIT License - See LICENSE file for details

-
security - not tested
F
license - not found
-
quality - not tested

hybrid server

The server is able to function both locally and remotely, depending on the configuration or use case.

Enables querying and retrieving webpage content from websites by parsing sitemap.xml files and fetching HTML content from specified URLs. Includes rate limiting protection and supports listing all available pages from a sitemap.

  1. Overview
    1. Features
      1. Installation
        1. Configuration
          1. Environment Setup
          2. Sitemap Configuration
        2. Usage
          1. Running the Server
          2. Running Tests
        3. Available Tools
          1. 1. list_pages()
          2. 2. get_page(path, user_id=None)
        4. Available Resources
          1. sitemap://sitemap.xml
        5. Deployment
          1. Deploy to Dedalus
          2. Use with Dedalus SDK
        6. Rate Limiting
          1. Architecture
            1. Error Handling
              1. Development
                1. Adding New Tools
                2. Adding New Resources
              2. License

                MCP directory API

                We provide all the information about MCP servers via our MCP API.

                curl -X GET 'https://glama.ai/api/mcp/v1/servers/brian-bfz/fireworks2'

                If you have feedback or need assistance with the MCP directory API, please join our Discord server