Webpage MCP Server

A Model Context Protocol (MCP) server for querying webpages and page contents from a specific website.

Overview

This MCP server provides tools to list and retrieve webpage content by parsing a sitemap.xml file and fetching HTML content from specified URLs. It includes built-in rate limiting to protect against abuse.

Features

List Pages: Parse sitemap.xml to get all available webpage paths
Get Page Content: Fetch HTML content from any webpage
Sitemap Resource: Access the raw sitemap.xml file
Rate Limiting: 10 requests per minute per user to prevent abuse

Installation

# Install dependencies using uv uv sync # Or using pip pip install -e .

Configuration

The server uses environment variables for configuration:

Variable	Description	Default
`BASE_URL`	The base URL of the website to query	`https://example.com`
`HOST`	Server host address	`0.0.0.0`
`PORT`	Server port number	`8080`

Environment Setup

Create a .env.local file in the project root:

BASE_URL=https://your-website.com HOST=0.0.0.0 PORT=8080

Sitemap Configuration

Place your sitemap.xml file in the assets/ directory. The server will automatically read from:

assets/sitemap.xml

Usage

Running the Server

STDIO Mode (for MCP clients):

uv run python src/main.py --stdio

HTTP Mode:

uv run python src/main.py --port 8080 # Server will be available at http://localhost:8080/mcp

Test Mode:

uv run python src/main.py --test

Running Tests

uv run python tests/test_server.py

Available Tools

1. list_pages()

Lists all webpage paths from the sitemap.

Parameters: None

Returns: List of page paths

Example:

list_pages() # Returns: ["/", "/blog", "/blog/post-1", "/marketplace", "/pricing"]

2. get_page(path, user_id=None)

Fetches HTML content from a webpage.

Parameters:

path (str): The webpage path (e.g., "/blog/post-1")
user_id (str, optional): User identifier for rate limiting

Returns: Dictionary with HTML content and metadata

Example:

get_page("/blog/post-1") # Returns: # { # "path": "/blog/post-1", # "url": "https://example.com/blog/post-1", # "html": "<html>...</html>", # "status_code": 200, # "content_type": "text/html" # }

Rate Limit Response:

# When rate limit is exceeded: { "error": "Rate limit exceeded", "message": "Too many requests. Please wait 45 seconds before trying again.", "reset_in_seconds": 45, "limit": "10 requests per minute" }

Available Resources

sitemap://sitemap.xml

Access the raw sitemap.xml content.

Example:

<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>https://example.com/</loc> <lastmod>2025-10-08T20:10:25.773Z</lastmod> <changefreq>weekly</changefreq> <priority>1</priority> </url> ... </urlset>

Deployment

Deploy to Dedalus

dedalus deploy ./src/main.py --name 'webpage-server'

Use with Dedalus SDK

from dedalus import Dedalus client = Dedalus( mcp_servers=['your-org/webpage-server'] )

Rate Limiting

The server implements rate limiting to protect against abuse:

Limit: 10 requests per minute per user
Window: 60 seconds rolling window
Identifier: Uses user_id parameter or 'default' if not provided

When the rate limit is exceeded, the server returns an error response with:

Time until the limit resets
Total number of allowed requests

Architecture

mcp-server-example-python/ ├── src/ │ └── main.py # Main server implementation ├── assets/ │ └── sitemap.xml # Sitemap data source ├── tests/ │ └── test_server.py # Test suite ├── docs/ │ └── README.md # This documentation └── pyproject.toml # Project configuration

Error Handling

The server handles various error conditions gracefully:

Missing Sitemap: Returns error if assets/sitemap.xml doesn't exist
Invalid Path: Returns error for malformed paths
Failed HTTP Request: Returns error with details when webpage fetch fails
Rate Limit: Returns structured error with reset time

Development

Adding New Tools

To add a new tool, use the @mcp.tool() decorator:

@mcp.tool() def your_tool_name(param: str) -> dict: """Tool description""" # Implementation return {"result": "value"}

Adding New Resources

To add a new resource, use the @mcp.resource() decorator:

@mcp.resource('resource://resource-name') def get_resource() -> str: """Resource description""" return "resource content"

License

MIT License - See LICENSE file for details

Webpage MCP Server

Webpage MCP Server

Overview

Features

Installation

Configuration

Environment Setup

Sitemap Configuration

Usage

Running the Server

Running Tests

Available Tools

1. list_pages()

2. get_page(path, user_id=None)

Available Resources

sitemap://sitemap.xml

Deployment

Deploy to Dedalus

Use with Dedalus SDK

Rate Limiting

Architecture

Error Handling

Development

Adding New Tools

Adding New Resources

License

New MCP Servers

MCP directory API