get_sitemap_tree

Fetch and parse the sitemap tree of any website by providing its homepage URL. Retrieve structured sitemap data and optionally include detailed page information.

Instructions

Fetch and parse the sitemap tree from a website URL

Input Schema

TableJSON Schema

Name	Required	Description	Default
`include_pages`	No	Whether to include page details in the response
`url`	Yes	The URL of the website homepage (e.g., https://example.com)

Implementation Reference

src/sitemap_mcp_server/server.py:171-213 (handler)
The primary handler function implementing the get_sitemap_tree tool logic. Validates URL, retrieves sitemap tree from cache or fetches new, computes counts, serializes to JSON.
async def get_sitemap_tree( ctx: Context, url: str = Field( ..., description="The URL of the website homepage (e.g., https://example.com)" ), include_pages: bool = Field( False, description="Whether to include page details in the response" ), ) -> str: try: normalized_url = normalize_and_validate_url(url) if not normalized_url: return safe_json_dumps( { "error": "Invalid URL provided. Please provide a valid HTTP or HTTPS URL.", "type": "ValidationError", } ) url = normalized_url tree = ctx.request_context.lifespan_context.get_sitemap(url) page_count = 0 sitemap_count = 0 if hasattr(tree, "all_pages"): try: page_count = sum(1 for _ in tree.all_pages()) except Exception as e: logger.debug(f"Error counting pages: {str(e)}") if hasattr(tree, "all_sitemaps"): try: sitemap_count = sum(1 for _ in tree.all_sitemaps()) except Exception as e: logger.debug(f"Error counting sitemaps: {str(e)}") logger.info(f"Found {page_count} pages and {sitemap_count} sitemaps for {url}.") sitemap_dict = tree.to_dict(with_pages=include_pages) return safe_json_dumps(sitemap_dict) except Exception as e: error_msg = f"Error fetching sitemap tree for {url}: {str(e)}" logger.error(error_msg) logger.exception(f"Detailed exception while fetching sitemap for {url}:") return safe_json_dumps( {"error": error_msg, "type": e.__class__.__name__, "details": str(e)} )
src/sitemap_mcp_server/server.py:168-170 (registration)
MCP tool registration decorator for get_sitemap_tree.
@mcp.tool( description="Fetch and parse the sitemap tree from a website URL", )
src/sitemap_mcp_server/server.py:100-133 (helper)
Core helper method in SitemapContext for retrieving sitemap trees with intelligent caching based on homepage normalization.
def get_sitemap(self, url: str) -> AbstractSitemap: """Get a sitemap tree for a homepage URL with caching. This method first normalizes the URL to its homepage using strip_url_to_homepage before checking the cache or fetching a new sitemap. This ensures that different URLs pointing to the same website (e.g., https://example.com and https://example.com/blog) will use the same cached sitemap data. Args: url: The URL of the website (will be normalized to homepage) Returns: The sitemap tree object """ # Try to get from cache first cached_tree = self.get_cached_sitemap(url) if cached_tree: return cached_tree logger.info(f"Fetching sitemap tree for {url}") start_time = time.time() # We still use the original URL for fetching, as sitemap_tree_for_homepage # will handle the normalization internally tree = sitemap_tree_for_homepage(url) # Cache using the normalized URL self.cache_sitemap(url, tree) elapsed_time = time.time() - start_time logger.info(f"Fetched sitemap tree for {url} in {elapsed_time:.2f} seconds") return tree
src/sitemap_mcp_server/server.py:173-179 (schema)
Pydantic Field definitions providing the input schema for the tool parameters.
url: str = Field( ..., description="The URL of the website homepage (e.g., https://example.com)" ), include_pages: bool = Field( False, description="Whether to include page details in the response" ), ) -> str:

Sitemap MCP Server

get_sitemap_tree

Instructions

Input Schema

Implementation Reference

Other Tools

Related Tools

Latest Blog Posts

MCP directory API