get_sitemap_tree
Fetch and parse the sitemap tree of any website by providing its homepage URL. Retrieve structured sitemap data and optionally include detailed page information.
Instructions
Fetch and parse the sitemap tree from a website URL
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| include_pages | No | Whether to include page details in the response | |
| url | Yes | The URL of the website homepage (e.g., https://example.com) |
Input Schema (JSON Schema)
{
"properties": {
"include_pages": {
"default": false,
"description": "Whether to include page details in the response",
"title": "Include Pages",
"type": "boolean"
},
"url": {
"description": "The URL of the website homepage (e.g., https://example.com)",
"title": "Url",
"type": "string"
}
},
"required": [
"url"
],
"title": "get_sitemap_treeArguments",
"type": "object"
}
Implementation Reference
- src/sitemap_mcp_server/server.py:171-213 (handler)The primary handler function implementing the get_sitemap_tree tool logic. Validates URL, retrieves sitemap tree from cache or fetches new, computes counts, serializes to JSON.async def get_sitemap_tree( ctx: Context, url: str = Field( ..., description="The URL of the website homepage (e.g., https://example.com)" ), include_pages: bool = Field( False, description="Whether to include page details in the response" ), ) -> str: try: normalized_url = normalize_and_validate_url(url) if not normalized_url: return safe_json_dumps( { "error": "Invalid URL provided. Please provide a valid HTTP or HTTPS URL.", "type": "ValidationError", } ) url = normalized_url tree = ctx.request_context.lifespan_context.get_sitemap(url) page_count = 0 sitemap_count = 0 if hasattr(tree, "all_pages"): try: page_count = sum(1 for _ in tree.all_pages()) except Exception as e: logger.debug(f"Error counting pages: {str(e)}") if hasattr(tree, "all_sitemaps"): try: sitemap_count = sum(1 for _ in tree.all_sitemaps()) except Exception as e: logger.debug(f"Error counting sitemaps: {str(e)}") logger.info(f"Found {page_count} pages and {sitemap_count} sitemaps for {url}.") sitemap_dict = tree.to_dict(with_pages=include_pages) return safe_json_dumps(sitemap_dict) except Exception as e: error_msg = f"Error fetching sitemap tree for {url}: {str(e)}" logger.error(error_msg) logger.exception(f"Detailed exception while fetching sitemap for {url}:") return safe_json_dumps( {"error": error_msg, "type": e.__class__.__name__, "details": str(e)} )
- src/sitemap_mcp_server/server.py:168-170 (registration)MCP tool registration decorator for get_sitemap_tree.@mcp.tool( description="Fetch and parse the sitemap tree from a website URL", )
- Core helper method in SitemapContext for retrieving sitemap trees with intelligent caching based on homepage normalization.def get_sitemap(self, url: str) -> AbstractSitemap: """Get a sitemap tree for a homepage URL with caching. This method first normalizes the URL to its homepage using strip_url_to_homepage before checking the cache or fetching a new sitemap. This ensures that different URLs pointing to the same website (e.g., https://example.com and https://example.com/blog) will use the same cached sitemap data. Args: url: The URL of the website (will be normalized to homepage) Returns: The sitemap tree object """ # Try to get from cache first cached_tree = self.get_cached_sitemap(url) if cached_tree: return cached_tree logger.info(f"Fetching sitemap tree for {url}") start_time = time.time() # We still use the original URL for fetching, as sitemap_tree_for_homepage # will handle the normalization internally tree = sitemap_tree_for_homepage(url) # Cache using the normalized URL self.cache_sitemap(url, tree) elapsed_time = time.time() - start_time logger.info(f"Fetched sitemap tree for {url} in {elapsed_time:.2f} seconds") return tree
- Pydantic Field definitions providing the input schema for the tool parameters.url: str = Field( ..., description="The URL of the website homepage (e.g., https://example.com)" ), include_pages: bool = Field( False, description="Whether to include page details in the response" ), ) -> str: