README.md•4.99 kB
# Webpage MCP Server
A Model Context Protocol (MCP) server for querying webpages and page contents from a specific website.
## Overview
This MCP server provides tools to list and retrieve webpage content by parsing a sitemap.xml file and fetching HTML content from specified URLs. It includes built-in rate limiting to protect against abuse.
## Features
- **List Pages**: Parse sitemap.xml to get all available webpage paths
- **Get Page Content**: Fetch HTML content from any webpage
- **Sitemap Resource**: Access the raw sitemap.xml file
- **Rate Limiting**: 10 requests per minute per user to prevent abuse
## Installation
```bash
# Install dependencies using uv
uv sync
# Or using pip
pip install -e .
```
## Configuration
The server uses environment variables for configuration:
| Variable | Description | Default |
|----------|-------------|---------|
| `BASE_URL` | The base URL of the website to query | `https://example.com` |
| `HOST` | Server host address | `0.0.0.0` |
| `PORT` | Server port number | `8080` |
### Environment Setup
Create a `.env.local` file in the project root:
```bash
BASE_URL=https://your-website.com
HOST=0.0.0.0
PORT=8080
```
### Sitemap Configuration
Place your sitemap.xml file in the `assets/` directory. The server will automatically read from:
```
assets/sitemap.xml
```
## Usage
### Running the Server
**STDIO Mode (for MCP clients):**
```bash
uv run python src/main.py --stdio
```
**HTTP Mode:**
```bash
uv run python src/main.py --port 8080
# Server will be available at http://localhost:8080/mcp
```
**Test Mode:**
```bash
uv run python src/main.py --test
```
### Running Tests
```bash
uv run python tests/test_server.py
```
## Available Tools
### 1. list_pages()
Lists all webpage paths from the sitemap.
**Parameters:** None
**Returns:** List of page paths
**Example:**
```python
list_pages()
# Returns: ["/", "/blog", "/blog/post-1", "/marketplace", "/pricing"]
```
### 2. get_page(path, user_id=None)
Fetches HTML content from a webpage.
**Parameters:**
- `path` (str): The webpage path (e.g., "/blog/post-1")
- `user_id` (str, optional): User identifier for rate limiting
**Returns:** Dictionary with HTML content and metadata
**Example:**
```python
get_page("/blog/post-1")
# Returns:
# {
# "path": "/blog/post-1",
# "url": "https://example.com/blog/post-1",
# "html": "<html>...</html>",
# "status_code": 200,
# "content_type": "text/html"
# }
```
**Rate Limit Response:**
```python
# When rate limit is exceeded:
{
"error": "Rate limit exceeded",
"message": "Too many requests. Please wait 45 seconds before trying again.",
"reset_in_seconds": 45,
"limit": "10 requests per minute"
}
```
## Available Resources
### sitemap://sitemap.xml
Access the raw sitemap.xml content.
**Example:**
```xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/</loc>
<lastmod>2025-10-08T20:10:25.773Z</lastmod>
<changefreq>weekly</changefreq>
<priority>1</priority>
</url>
...
</urlset>
```
## Deployment
### Deploy to Dedalus
```bash
dedalus deploy ./src/main.py --name 'webpage-server'
```
### Use with Dedalus SDK
```python
from dedalus import Dedalus
client = Dedalus(
mcp_servers=['your-org/webpage-server']
)
```
## Rate Limiting
The server implements rate limiting to protect against abuse:
- **Limit:** 10 requests per minute per user
- **Window:** 60 seconds rolling window
- **Identifier:** Uses `user_id` parameter or 'default' if not provided
When the rate limit is exceeded, the server returns an error response with:
- Time until the limit resets
- Total number of allowed requests
## Architecture
```
mcp-server-example-python/
├── src/
│ └── main.py # Main server implementation
├── assets/
│ └── sitemap.xml # Sitemap data source
├── tests/
│ └── test_server.py # Test suite
├── docs/
│ └── README.md # This documentation
└── pyproject.toml # Project configuration
```
## Error Handling
The server handles various error conditions gracefully:
1. **Missing Sitemap:** Returns error if `assets/sitemap.xml` doesn't exist
2. **Invalid Path:** Returns error for malformed paths
3. **Failed HTTP Request:** Returns error with details when webpage fetch fails
4. **Rate Limit:** Returns structured error with reset time
## Development
### Adding New Tools
To add a new tool, use the `@mcp.tool()` decorator:
```python
@mcp.tool()
def your_tool_name(param: str) -> dict:
"""Tool description"""
# Implementation
return {"result": "value"}
```
### Adding New Resources
To add a new resource, use the `@mcp.resource()` decorator:
```python
@mcp.resource('resource://resource-name')
def get_resource() -> str:
"""Resource description"""
return "resource content"
```
## License
MIT License - See LICENSE file for details