Skip to main content
Glama
mugoosse

Sitemap MCP Server

get_sitemap_tree

Fetch and parse the sitemap tree of any website by providing its homepage URL. Retrieve structured sitemap data and optionally include detailed page information.

Instructions

Fetch and parse the sitemap tree from a website URL

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
include_pagesNoWhether to include page details in the response
urlYesThe URL of the website homepage (e.g., https://example.com)

Implementation Reference

  • The primary handler function implementing the get_sitemap_tree tool logic. Validates URL, retrieves sitemap tree from cache or fetches new, computes counts, serializes to JSON.
    async def get_sitemap_tree(
        ctx: Context,
        url: str = Field(
            ..., description="The URL of the website homepage (e.g., https://example.com)"
        ),
        include_pages: bool = Field(
            False, description="Whether to include page details in the response"
        ),
    ) -> str:
        try:
            normalized_url = normalize_and_validate_url(url)
            if not normalized_url:
                return safe_json_dumps(
                    {
                        "error": "Invalid URL provided. Please provide a valid HTTP or HTTPS URL.",
                        "type": "ValidationError",
                    }
                )
            url = normalized_url
            tree = ctx.request_context.lifespan_context.get_sitemap(url)
            page_count = 0
            sitemap_count = 0
            if hasattr(tree, "all_pages"):
                try:
                    page_count = sum(1 for _ in tree.all_pages())
                except Exception as e:
                    logger.debug(f"Error counting pages: {str(e)}")
            if hasattr(tree, "all_sitemaps"):
                try:
                    sitemap_count = sum(1 for _ in tree.all_sitemaps())
                except Exception as e:
                    logger.debug(f"Error counting sitemaps: {str(e)}")
            logger.info(f"Found {page_count} pages and {sitemap_count} sitemaps for {url}.")
            sitemap_dict = tree.to_dict(with_pages=include_pages)
            return safe_json_dumps(sitemap_dict)
        except Exception as e:
            error_msg = f"Error fetching sitemap tree for {url}: {str(e)}"
            logger.error(error_msg)
            logger.exception(f"Detailed exception while fetching sitemap for {url}:")
            return safe_json_dumps(
                {"error": error_msg, "type": e.__class__.__name__, "details": str(e)}
            )
  • MCP tool registration decorator for get_sitemap_tree.
    @mcp.tool(
        description="Fetch and parse the sitemap tree from a website URL",
    )
  • Core helper method in SitemapContext for retrieving sitemap trees with intelligent caching based on homepage normalization.
    def get_sitemap(self, url: str) -> AbstractSitemap:
        """Get a sitemap tree for a homepage URL with caching.
    
        This method first normalizes the URL to its homepage using strip_url_to_homepage
        before checking the cache or fetching a new sitemap. This ensures that different URLs
        pointing to the same website (e.g., https://example.com and https://example.com/blog)
        will use the same cached sitemap data.
    
        Args:
            url: The URL of the website (will be normalized to homepage)
    
        Returns:
            The sitemap tree object
        """
        # Try to get from cache first
        cached_tree = self.get_cached_sitemap(url)
        if cached_tree:
            return cached_tree
    
        logger.info(f"Fetching sitemap tree for {url}")
        start_time = time.time()
    
        # We still use the original URL for fetching, as sitemap_tree_for_homepage
        # will handle the normalization internally
        tree = sitemap_tree_for_homepage(url)
    
        # Cache using the normalized URL
        self.cache_sitemap(url, tree)
    
        elapsed_time = time.time() - start_time
        logger.info(f"Fetched sitemap tree for {url} in {elapsed_time:.2f} seconds")
    
        return tree
  • Pydantic Field definitions providing the input schema for the tool parameters.
        url: str = Field(
            ..., description="The URL of the website homepage (e.g., https://example.com)"
        ),
        include_pages: bool = Field(
            False, description="Whether to include page details in the response"
        ),
    ) -> str:
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions 'fetch and parse,' implying network interaction and data processing, but lacks details on error handling, rate limits, authentication needs, or what the parsed tree structure looks like. For a tool that interacts with external websites, this omission is significant and leaves key behavioral aspects unclear.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that directly states the tool's function without unnecessary words. It is front-loaded with the core action ('fetch and parse') and resource ('sitemap tree'), making it easy to grasp quickly. Every part of the sentence contributes to understanding the tool's purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (involving network fetching and parsing), lack of annotations, and absence of an output schema, the description is incomplete. It doesn't address what the parsed tree output entails, potential errors (e.g., invalid URLs or sitemap formats), or performance considerations. For a tool that likely returns structured data, more context is needed to guide effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with clear documentation for both parameters ('url' and 'include_pages'). The description adds no additional parameter semantics beyond what the schema provides, such as explaining how 'include_pages' affects the parsed tree or providing examples of valid URL formats. Given the high schema coverage, a baseline score of 3 is appropriate, as the description doesn't compensate but also doesn't detract.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('fetch and parse') and resource ('sitemap tree from a website URL'), making the tool's purpose understandable. However, it doesn't explicitly differentiate from sibling tools like 'get_sitemap_pages' or 'parse_sitemap_content', which likely handle similar sitemap-related tasks, leaving some ambiguity about when to choose this specific tool.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It mentions fetching and parsing a sitemap tree, but doesn't specify scenarios where this is preferred over siblings like 'get_sitemap_pages' (which might retrieve individual pages) or 'parse_sitemap_content' (which might handle raw sitemap data). Without such context, users must infer usage from tool names alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mugoosse/sitemap-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server