Skip to main content
Glama

Webpage MCP Server

A Model Context Protocol (MCP) server for querying webpages and page contents from a specific website.

Overview

This MCP server provides tools to list and retrieve webpage content by parsing a sitemap.xml file and fetching HTML content from specified URLs. It includes built-in rate limiting to protect against abuse.

Features

  • List Pages: Parse sitemap.xml to get all available webpage paths

  • Get Page Content: Fetch HTML content from any webpage

  • Sitemap Resource: Access the raw sitemap.xml file

  • Rate Limiting: 10 requests per minute per user to prevent abuse

Installation

# Install dependencies using uv uv sync # Or using pip pip install -e .

Configuration

The server uses environment variables for configuration:

Variable

Description

Default

BASE_URL

The base URL of the website to query

https://example.com

HOST

Server host address

0.0.0.0

PORT

Server port number

8080

Environment Setup

Create a .env.local file in the project root:

BASE_URL=https://your-website.com HOST=0.0.0.0 PORT=8080

Sitemap Configuration

Place your sitemap.xml file in the assets/ directory. The server will automatically read from:

assets/sitemap.xml

Usage

Running the Server

STDIO Mode (for MCP clients):

uv run python src/main.py --stdio

HTTP Mode:

uv run python src/main.py --port 8080 # Server will be available at http://localhost:8080/mcp

Test Mode:

uv run python src/main.py --test

Running Tests

uv run python tests/test_server.py

Available Tools

1. list_pages()

Lists all webpage paths from the sitemap.

Parameters: None

Returns: List of page paths

Example:

list_pages() # Returns: ["/", "/blog", "/blog/post-1", "/marketplace", "/pricing"]

2. get_page(path, user_id=None)

Fetches HTML content from a webpage.

Parameters:

  • path (str): The webpage path (e.g., "/blog/post-1")

  • user_id (str, optional): User identifier for rate limiting

Returns: Dictionary with HTML content and metadata

Example:

get_page("/blog/post-1") # Returns: # { # "path": "/blog/post-1", # "url": "https://example.com/blog/post-1", # "html": "<html>...</html>", # "status_code": 200, # "content_type": "text/html" # }

Rate Limit Response:

# When rate limit is exceeded: { "error": "Rate limit exceeded", "message": "Too many requests. Please wait 45 seconds before trying again.", "reset_in_seconds": 45, "limit": "10 requests per minute" }

Available Resources

sitemap://sitemap.xml

Access the raw sitemap.xml content.

Example:

<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>https://example.com/</loc> <lastmod>2025-10-08T20:10:25.773Z</lastmod> <changefreq>weekly</changefreq> <priority>1</priority> </url> ... </urlset>

Deployment

Deploy to Dedalus

dedalus deploy ./src/main.py --name 'webpage-server'

Use with Dedalus SDK

from dedalus import Dedalus client = Dedalus( mcp_servers=['your-org/webpage-server'] )

Rate Limiting

The server implements rate limiting to protect against abuse:

  • Limit: 10 requests per minute per user

  • Window: 60 seconds rolling window

  • Identifier: Uses user_id parameter or 'default' if not provided

When the rate limit is exceeded, the server returns an error response with:

  • Time until the limit resets

  • Total number of allowed requests

Architecture

mcp-server-example-python/ ├── src/ │ └── main.py # Main server implementation ├── assets/ │ └── sitemap.xml # Sitemap data source ├── tests/ │ └── test_server.py # Test suite ├── docs/ │ └── README.md # This documentation └── pyproject.toml # Project configuration

Error Handling

The server handles various error conditions gracefully:

  1. Missing Sitemap: Returns error if assets/sitemap.xml doesn't exist

  2. Invalid Path: Returns error for malformed paths

  3. Failed HTTP Request: Returns error with details when webpage fetch fails

  4. Rate Limit: Returns structured error with reset time

Development

Adding New Tools

To add a new tool, use the @mcp.tool() decorator:

@mcp.tool() def your_tool_name(param: str) -> dict: """Tool description""" # Implementation return {"result": "value"}

Adding New Resources

To add a new resource, use the @mcp.resource() decorator:

@mcp.resource('resource://resource-name') def get_resource() -> str: """Resource description""" return "resource content"

License

MIT License - See LICENSE file for details

-
security - not tested
-
license - not tested
-
quality - not tested

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/brian-bfz/fireworks4'

If you have feedback or need assistance with the MCP directory API, please join our Discord server