Webpage MCP Server

README.md•4.99 kB

# Webpage MCP Server A Model Context Protocol (MCP) server for querying webpages and page contents from a specific website. ## Overview This MCP server provides tools to list and retrieve webpage content by parsing a sitemap.xml file and fetching HTML content from specified URLs. It includes built-in rate limiting to protect against abuse. ## Features - **List Pages**: Parse sitemap.xml to get all available webpage paths - **Get Page Content**: Fetch HTML content from any webpage - **Sitemap Resource**: Access the raw sitemap.xml file - **Rate Limiting**: 10 requests per minute per user to prevent abuse ## Installation ```bash # Install dependencies using uv uv sync # Or using pip pip install -e . ``` ## Configuration The server uses environment variables for configuration: | Variable | Description | Default | |----------|-------------|---------| | `BASE_URL` | The base URL of the website to query | `https://example.com` | | `HOST` | Server host address | `0.0.0.0` | | `PORT` | Server port number | `8080` | ### Environment Setup Create a `.env.local` file in the project root: ```bash BASE_URL=https://your-website.com HOST=0.0.0.0 PORT=8080 ``` ### Sitemap Configuration Place your sitemap.xml file in the `assets/` directory. The server will automatically read from: ``` assets/sitemap.xml ``` ## Usage ### Running the Server **STDIO Mode (for MCP clients):** ```bash uv run python src/main.py --stdio ``` **HTTP Mode:** ```bash uv run python src/main.py --port 8080 # Server will be available at http://localhost:8080/mcp ``` **Test Mode:** ```bash uv run python src/main.py --test ``` ### Running Tests ```bash uv run python tests/test_server.py ``` ## Available Tools ### 1. list_pages() Lists all webpage paths from the sitemap. **Parameters:** None **Returns:** List of page paths **Example:** ```python list_pages() # Returns: ["/", "/blog", "/blog/post-1", "/marketplace", "/pricing"] ``` ### 2. get_page(path, user_id=None) Fetches HTML content from a webpage. **Parameters:** - `path` (str): The webpage path (e.g., "/blog/post-1") - `user_id` (str, optional): User identifier for rate limiting **Returns:** Dictionary with HTML content and metadata **Example:** ```python get_page("/blog/post-1") # Returns: # { # "path": "/blog/post-1", # "url": "https://example.com/blog/post-1", # "html": "<html>...</html>", # "status_code": 200, # "content_type": "text/html" # } ``` **Rate Limit Response:** ```python # When rate limit is exceeded: { "error": "Rate limit exceeded", "message": "Too many requests. Please wait 45 seconds before trying again.", "reset_in_seconds": 45, "limit": "10 requests per minute" } ``` ## Available Resources ### sitemap://sitemap.xml Access the raw sitemap.xml content. **Example:** ```xml <?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>https://example.com/</loc> <lastmod>2025-10-08T20:10:25.773Z</lastmod> <changefreq>weekly</changefreq> <priority>1</priority> </url> ... </urlset> ``` ## Deployment ### Deploy to Dedalus ```bash dedalus deploy ./src/main.py --name 'webpage-server' ``` ### Use with Dedalus SDK ```python from dedalus import Dedalus client = Dedalus( mcp_servers=['your-org/webpage-server'] ) ``` ## Rate Limiting The server implements rate limiting to protect against abuse: - **Limit:** 10 requests per minute per user - **Window:** 60 seconds rolling window - **Identifier:** Uses `user_id` parameter or 'default' if not provided When the rate limit is exceeded, the server returns an error response with: - Time until the limit resets - Total number of allowed requests ## Architecture ``` mcp-server-example-python/ ├── src/ │ └── main.py # Main server implementation ├── assets/ │ └── sitemap.xml # Sitemap data source ├── tests/ │ └── test_server.py # Test suite ├── docs/ │ └── README.md # This documentation └── pyproject.toml # Project configuration ``` ## Error Handling The server handles various error conditions gracefully: 1. **Missing Sitemap:** Returns error if `assets/sitemap.xml` doesn't exist 2. **Invalid Path:** Returns error for malformed paths 3. **Failed HTTP Request:** Returns error with details when webpage fetch fails 4. **Rate Limit:** Returns structured error with reset time ## Development ### Adding New Tools To add a new tool, use the `@mcp.tool()` decorator: ```python @mcp.tool() def your_tool_name(param: str) -> dict: """Tool description""" # Implementation return {"result": "value"} ``` ### Adding New Resources To add a new resource, use the `@mcp.resource()` decorator: ```python @mcp.resource('resource://resource-name') def get_resource() -> str: """Resource description""" return "resource content" ``` ## License MIT License - See LICENSE file for details

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/brian-bfz/fireworks4'

If you have feedback or need assistance with the MCP directory API, please join our Discord server