Uses environment variables for server configuration including base URL, host, and port settings
Implemented as a Python-based MCP server with tools for parsing sitemaps and fetching webpage content
Uses TOML format for project configuration through pyproject.toml file
Parses sitemap.xml files to extract webpage paths and provides access to raw XML sitemap content as a resource
Webpage MCP Server
A Model Context Protocol (MCP) server for querying webpages and page contents from a specific website.
Overview
This MCP server provides tools to list and retrieve webpage content by parsing a sitemap.xml file and fetching HTML content from specified URLs. It includes built-in rate limiting to protect against abuse.
Features
List Pages: Parse sitemap.xml to get all available webpage paths
Get Page Content: Fetch HTML content from any webpage
Sitemap Resource: Access the raw sitemap.xml file
Rate Limiting: 10 requests per minute per user to prevent abuse
Installation
Configuration
The server uses environment variables for configuration:
Variable | Description | Default |
| The base URL of the website to query |
|
| Server host address |
|
| Server port number |
|
Environment Setup
Create a .env.local
file in the project root:
Sitemap Configuration
Place your sitemap.xml file in the assets/
directory. The server will automatically read from:
Usage
Running the Server
STDIO Mode (for MCP clients):
HTTP Mode:
Test Mode:
Running Tests
Available Tools
1. list_pages()
Lists all webpage paths from the sitemap.
Parameters: None
Returns: List of page paths
Example:
2. get_page(path, user_id=None)
Fetches HTML content from a webpage.
Parameters:
path
(str): The webpage path (e.g., "/blog/post-1")user_id
(str, optional): User identifier for rate limiting
Returns: Dictionary with HTML content and metadata
Example:
Rate Limit Response:
Available Resources
sitemap://sitemap.xml
Access the raw sitemap.xml content.
Example:
Deployment
Deploy to Dedalus
Use with Dedalus SDK
Rate Limiting
The server implements rate limiting to protect against abuse:
Limit: 10 requests per minute per user
Window: 60 seconds rolling window
Identifier: Uses
user_id
parameter or 'default' if not provided
When the rate limit is exceeded, the server returns an error response with:
Time until the limit resets
Total number of allowed requests
Architecture
Error Handling
The server handles various error conditions gracefully:
Missing Sitemap: Returns error if
assets/sitemap.xml
doesn't existInvalid Path: Returns error for malformed paths
Failed HTTP Request: Returns error with details when webpage fetch fails
Rate Limit: Returns structured error with reset time
Development
Adding New Tools
To add a new tool, use the @mcp.tool()
decorator:
Adding New Resources
To add a new resource, use the @mcp.resource()
decorator:
License
MIT License - See LICENSE file for details
This server cannot be installed
hybrid server
The server is able to function both locally and remotely, depending on the configuration or use case.
Enables querying and retrieving webpage content from websites by parsing sitemap.xml files and fetching HTML content from specified URLs. Includes rate limiting protection and supports listing all available pages from a sitemap.