Webpage MCP Server

API.md•6.69 kB

# API Reference Complete API documentation for the Webpage MCP Server. ## Tools ### list_pages() Lists all webpage paths available in the sitemap. **Signature:** ```python def list_pages() -> List[str] ``` **Parameters:** None **Returns:** - `List[str]`: Sorted list of unique webpage paths **Example Request:** ```python list_pages() ``` **Example Response:** ```python [ "/", "/blog", "/blog/post-1", "/blog/yc-ankit-gupta-interview", "/marketplace", "/pricing" ] ``` **Implementation Details:** - Reads sitemap from `assets/sitemap.xml` - Parses XML with namespace support - Extracts URLs and converts to paths - Removes duplicates and sorts alphabetically **Error Cases:** - Raises `ValueError` if sitemap file not found - Raises `ValueError` if sitemap XML is malformed --- ### get_page(path, user_id=None) Fetches HTML content from a specific webpage. **Signature:** ```python def get_page(path: str, user_id: Optional[str] = None) -> Dict[str, Any] ``` **Parameters:** - `path` (str, required): Webpage path (e.g., "/blog/post-1") - If doesn't start with "/", it will be prepended automatically - `user_id` (str, optional): User identifier for rate limiting - Default: "default" **Returns:** - `Dict[str, Any]`: Response dictionary with the following structure: **Success Response:** ```python { "path": str, # The requested path "url": str, # Full URL that was fetched "html": str, # HTML content of the page "status_code": int, # HTTP status code (e.g., 200) "content_type": str # Content-Type header value } ``` **Error Response (Rate Limit):** ```python { "error": "Rate limit exceeded", "message": "Too many requests. Please wait X seconds before trying again.", "reset_in_seconds": int, "limit": "10 requests per minute" } ``` **Error Response (Fetch Failed):** ```python { "error": "Failed to fetch page", "path": str, "url": str, "message": str # Error details } ``` **Example Usage:** Simple request: ```python result = get_page("/blog/post-1") print(result["html"]) # Prints HTML content ``` With user identification: ```python result = get_page("/pricing", user_id="user123") ``` Path without leading slash: ```python result = get_page("marketplace") # Automatically becomes "/marketplace" ``` **Rate Limiting:** - 10 requests per minute per `user_id` - Uses rolling window of 60 seconds - Different `user_id` values have separate rate limits - All requests without `user_id` share the "default" limit **Implementation Details:** - Constructs full URL using `BASE_URL` environment variable - Uses `requests` library with 10-second timeout - Rate limit is checked before making HTTP request - Handles HTTP errors gracefully --- ## Resources ### sitemap://sitemap.xml Provides access to the raw sitemap.xml content. **Signature:** ```python @mcp.resource('sitemap://sitemap.xml') def get_sitemap() -> str ``` **Returns:** - `str`: Raw XML content of the sitemap file **Example Response:** ```xml <?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>https://dedaluslabs.ai/</loc> <lastmod>2025-10-08T20:10:25.773Z</lastmod> <changefreq>weekly</changefreq> <priority>1</priority> </url> <url> <loc>https://dedaluslabs.ai/blog</loc> <lastmod>2025-10-08T20:10:25.773Z</lastmod> <changefreq>monthly</changefreq> <priority>0.5</priority> </url> </urlset> ``` **Error Cases:** - Raises `ValueError` if sitemap file not found at `assets/sitemap.xml` --- ## Rate Limiter Class The server uses a `RateLimiter` class to manage request throttling. ### RateLimiter **Initialization:** ```python RateLimiter(max_requests: int = 10, window_seconds: int = 60) ``` **Parameters:** - `max_requests`: Maximum number of requests allowed in the time window - `window_seconds`: Time window in seconds **Methods:** #### is_allowed(identifier: str) -> bool Check if a request is allowed for the given identifier. **Parameters:** - `identifier`: Unique identifier for the requester (e.g., user_id) **Returns:** - `bool`: True if request is allowed, False if rate limit exceeded **Behavior:** - Automatically cleans up old requests outside the time window - Adds current request timestamp if under limit - Returns False if limit is exceeded #### get_reset_time(identifier: str) -> int Get the number of seconds until the rate limit resets. **Parameters:** - `identifier`: Unique identifier for the requester **Returns:** - `int`: Seconds until the oldest request expires (0 if no requests) --- ## Environment Variables ### BASE_URL The base URL of the website to query. **Type:** `str` **Default:** `"https://example.com"` **Example:** `"https://dedaluslabs.ai"` Used to construct full URLs when fetching pages: ```python full_url = BASE_URL + path # e.g., "https://dedaluslabs.ai" + "/blog" = "https://dedaluslabs.ai/blog" ``` ### HOST Server host address for HTTP transport. **Type:** `str` **Default:** `"0.0.0.0"` **Example:** `"127.0.0.1"` ### PORT Server port number for HTTP transport. **Type:** `int` **Default:** `8080` **Example:** `3000` --- ## Transport Modes ### STDIO Transport Default mode for MCP clients: ```bash python src/main.py --stdio ``` Communication via standard input/output. ### HTTP Transport For network-based access: ```bash python src/main.py --port 8080 ``` MCP endpoint available at: `http://localhost:8080/mcp` ### Test Mode For verification without starting the server: ```bash python src/main.py --test ``` --- ## Error Codes and Messages ### ValueError: "Sitemap file not found" **Cause:** The `assets/sitemap.xml` file doesn't exist **Solution:** Ensure sitemap.xml is present in the assets directory ### ValueError: "Failed to parse sitemap" **Cause:** The sitemap XML is malformed or invalid **Solution:** Validate your sitemap.xml against the sitemap schema ### Rate Limit Error **Response:** ```python { "error": "Rate limit exceeded", "message": "Too many requests. Please wait X seconds before trying again.", "reset_in_seconds": X, "limit": "10 requests per minute" } ``` **Cause:** More than 10 requests in 60 seconds for the same user_id **Solution:** Wait for the specified time before retrying ### HTTP Request Error **Response:** ```python { "error": "Failed to fetch page", "path": "/example", "url": "https://example.com/example", "message": "Connection timeout" } ``` **Common Causes:** - Network connectivity issues - Invalid BASE_URL configuration - Page doesn't exist (404) - Request timeout (>10 seconds)

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/brian-bfz/fireworks4'

If you have feedback or need assistance with the MCP directory API, please join our Discord server