API.md•6.69 kB
# API Reference
Complete API documentation for the Webpage MCP Server.
## Tools
### list_pages()
Lists all webpage paths available in the sitemap.
**Signature:**
```python
def list_pages() -> List[str]
```
**Parameters:** None
**Returns:**
- `List[str]`: Sorted list of unique webpage paths
**Example Request:**
```python
list_pages()
```
**Example Response:**
```python
[
"/",
"/blog",
"/blog/post-1",
"/blog/yc-ankit-gupta-interview",
"/marketplace",
"/pricing"
]
```
**Implementation Details:**
- Reads sitemap from `assets/sitemap.xml`
- Parses XML with namespace support
- Extracts URLs and converts to paths
- Removes duplicates and sorts alphabetically
**Error Cases:**
- Raises `ValueError` if sitemap file not found
- Raises `ValueError` if sitemap XML is malformed
---
### get_page(path, user_id=None)
Fetches HTML content from a specific webpage.
**Signature:**
```python
def get_page(path: str, user_id: Optional[str] = None) -> Dict[str, Any]
```
**Parameters:**
- `path` (str, required): Webpage path (e.g., "/blog/post-1")
- If doesn't start with "/", it will be prepended automatically
- `user_id` (str, optional): User identifier for rate limiting
- Default: "default"
**Returns:**
- `Dict[str, Any]`: Response dictionary with the following structure:
**Success Response:**
```python
{
"path": str, # The requested path
"url": str, # Full URL that was fetched
"html": str, # HTML content of the page
"status_code": int, # HTTP status code (e.g., 200)
"content_type": str # Content-Type header value
}
```
**Error Response (Rate Limit):**
```python
{
"error": "Rate limit exceeded",
"message": "Too many requests. Please wait X seconds before trying again.",
"reset_in_seconds": int,
"limit": "10 requests per minute"
}
```
**Error Response (Fetch Failed):**
```python
{
"error": "Failed to fetch page",
"path": str,
"url": str,
"message": str # Error details
}
```
**Example Usage:**
Simple request:
```python
result = get_page("/blog/post-1")
print(result["html"]) # Prints HTML content
```
With user identification:
```python
result = get_page("/pricing", user_id="user123")
```
Path without leading slash:
```python
result = get_page("marketplace") # Automatically becomes "/marketplace"
```
**Rate Limiting:**
- 10 requests per minute per `user_id`
- Uses rolling window of 60 seconds
- Different `user_id` values have separate rate limits
- All requests without `user_id` share the "default" limit
**Implementation Details:**
- Constructs full URL using `BASE_URL` environment variable
- Uses `requests` library with 10-second timeout
- Rate limit is checked before making HTTP request
- Handles HTTP errors gracefully
---
## Resources
### sitemap://sitemap.xml
Provides access to the raw sitemap.xml content.
**Signature:**
```python
@mcp.resource('sitemap://sitemap.xml')
def get_sitemap() -> str
```
**Returns:**
- `str`: Raw XML content of the sitemap file
**Example Response:**
```xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://dedaluslabs.ai/</loc>
<lastmod>2025-10-08T20:10:25.773Z</lastmod>
<changefreq>weekly</changefreq>
<priority>1</priority>
</url>
<url>
<loc>https://dedaluslabs.ai/blog</loc>
<lastmod>2025-10-08T20:10:25.773Z</lastmod>
<changefreq>monthly</changefreq>
<priority>0.5</priority>
</url>
</urlset>
```
**Error Cases:**
- Raises `ValueError` if sitemap file not found at `assets/sitemap.xml`
---
## Rate Limiter Class
The server uses a `RateLimiter` class to manage request throttling.
### RateLimiter
**Initialization:**
```python
RateLimiter(max_requests: int = 10, window_seconds: int = 60)
```
**Parameters:**
- `max_requests`: Maximum number of requests allowed in the time window
- `window_seconds`: Time window in seconds
**Methods:**
#### is_allowed(identifier: str) -> bool
Check if a request is allowed for the given identifier.
**Parameters:**
- `identifier`: Unique identifier for the requester (e.g., user_id)
**Returns:**
- `bool`: True if request is allowed, False if rate limit exceeded
**Behavior:**
- Automatically cleans up old requests outside the time window
- Adds current request timestamp if under limit
- Returns False if limit is exceeded
#### get_reset_time(identifier: str) -> int
Get the number of seconds until the rate limit resets.
**Parameters:**
- `identifier`: Unique identifier for the requester
**Returns:**
- `int`: Seconds until the oldest request expires (0 if no requests)
---
## Environment Variables
### BASE_URL
The base URL of the website to query.
**Type:** `str`
**Default:** `"https://example.com"`
**Example:** `"https://dedaluslabs.ai"`
Used to construct full URLs when fetching pages:
```python
full_url = BASE_URL + path
# e.g., "https://dedaluslabs.ai" + "/blog" = "https://dedaluslabs.ai/blog"
```
### HOST
Server host address for HTTP transport.
**Type:** `str`
**Default:** `"0.0.0.0"`
**Example:** `"127.0.0.1"`
### PORT
Server port number for HTTP transport.
**Type:** `int`
**Default:** `8080`
**Example:** `3000`
---
## Transport Modes
### STDIO Transport
Default mode for MCP clients:
```bash
python src/main.py --stdio
```
Communication via standard input/output.
### HTTP Transport
For network-based access:
```bash
python src/main.py --port 8080
```
MCP endpoint available at: `http://localhost:8080/mcp`
### Test Mode
For verification without starting the server:
```bash
python src/main.py --test
```
---
## Error Codes and Messages
### ValueError: "Sitemap file not found"
**Cause:** The `assets/sitemap.xml` file doesn't exist
**Solution:** Ensure sitemap.xml is present in the assets directory
### ValueError: "Failed to parse sitemap"
**Cause:** The sitemap XML is malformed or invalid
**Solution:** Validate your sitemap.xml against the sitemap schema
### Rate Limit Error
**Response:**
```python
{
"error": "Rate limit exceeded",
"message": "Too many requests. Please wait X seconds before trying again.",
"reset_in_seconds": X,
"limit": "10 requests per minute"
}
```
**Cause:** More than 10 requests in 60 seconds for the same user_id
**Solution:** Wait for the specified time before retrying
### HTTP Request Error
**Response:**
```python
{
"error": "Failed to fetch page",
"path": "/example",
"url": "https://example.com/example",
"message": "Connection timeout"
}
```
**Common Causes:**
- Network connectivity issues
- Invalid BASE_URL configuration
- Page doesn't exist (404)
- Request timeout (>10 seconds)