Skip to main content
Glama
mugoosse

Sitemap MCP Server

get_sitemap_stats

Analyze a website's sitemap structure by providing the homepage URL. Retrieve detailed statistics to understand sitemap organization and content distribution.

Instructions

Get comprehensive statistics about a website's sitemap structure

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesThe URL of the website homepage (e.g., https://example.com)

Implementation Reference

  • The primary handler function for the 'get_sitemap_stats' tool. It is registered via the @mcp.tool decorator. The function fetches the sitemap tree from context, iterates over sitemaps and pages to compute comprehensive statistics including page counts, sitemap types, priority stats, and lastmod counts, then returns formatted JSON.
    @mcp.tool(
        description="Get comprehensive statistics about a website's sitemap structure"
    )
    async def get_sitemap_stats(
        ctx: Context,
        url: str = Field(
            ..., description="The URL of the website homepage (e.g., https://example.com)"
        ),
    ) -> str:
        """Get statistics about a website's sitemap.
    
        This tool analyzes a website's sitemap and returns statistics such as:
        - Total number of pages
        - Number of subsitemaps
        - Types of sitemaps found
        - Last modification dates (min, max, average)
        - Priority statistics
        - Detailed statistics for each subsitemap
        """
        try:
            # Validate URL and normalize it if needed
            normalized_url = normalize_and_validate_url(url)
            if not normalized_url:
                return safe_json_dumps(
                    {
                        "error": "Invalid URL provided. Please provide a valid HTTP or HTTPS URL.",
                        "type": "ValidationError",
                    }
                )
            url = normalized_url
            # Log the operation start
            logger.info(f"Analyzing sitemap statistics for {url}")
            start_time = time.time()
    
            # Get the sitemap tree with caching directly from the context
            tree = ctx.request_context.lifespan_context.get_sitemap(url)
    
            # Collect total statistics
            total_stats = {
                "url": url,
                "page_count": 0,
                "sitemap_count": 0,
                "sitemap_types": set(),
                "last_modified_dates": [],
                "priorities": [],
            }
    
            # Dictionary to store stats for each subsitemap
            subsitemap_stats = []
    
            # Process each sitemap and collect stats
            for sitemap in tree.all_sitemaps():
                # Update total stats
                total_stats["sitemap_count"] += 1
                total_stats["sitemap_types"].add(sitemap.__class__.__name__)
    
                # Create individual sitemap stats
                sitemap_url = getattr(sitemap, "url", None)
                if not sitemap_url:
                    continue
    
                # Initialize stats for this subsitemap
                current_sitemap_stats = {
                    "url": sitemap_url,
                    "type": sitemap.__class__.__name__,
                    "page_count": 0,
                    "priorities": [],
                    "last_modified_dates": [],
                }
    
                # Count pages in this sitemap
                if hasattr(sitemap, "pages"):
                    for page in sitemap.pages:
                        # Update subsitemap stats
                        current_sitemap_stats["page_count"] += 1
    
                        # Collect priority if available
                        if hasattr(page, "priority") and page.priority is not None:
                            try:
                                priority_value = float(page.priority)
                                current_sitemap_stats["priorities"].append(priority_value)
                            except (ValueError, TypeError):
                                pass
    
                        # Collect last modified date if available
                        if (
                            hasattr(page, "last_modified")
                            and page.last_modified is not None
                        ):
                            current_sitemap_stats["last_modified_dates"].append(
                                page.last_modified.isoformat()
                            )
    
                # Calculate priority statistics for this sitemap if we have any pages
                if current_sitemap_stats["priorities"]:
                    current_sitemap_stats["priority_stats"] = {
                        "min": min(current_sitemap_stats["priorities"]),
                        "max": max(current_sitemap_stats["priorities"]),
                        "avg": sum(current_sitemap_stats["priorities"])
                        / len(current_sitemap_stats["priorities"]),
                    }
    
                # Calculate last modified stats if available
                if current_sitemap_stats["last_modified_dates"]:
                    current_sitemap_stats["last_modified_count"] = len(
                        current_sitemap_stats["last_modified_dates"]
                    )
    
                # Remove raw data lists to keep response size reasonable
                del current_sitemap_stats["priorities"]
                del current_sitemap_stats["last_modified_dates"]
    
                # Add to the list of subsitemap stats
                subsitemap_stats.append(current_sitemap_stats)
    
            # Collect page statistics for total stats
            for page in tree.all_pages():
                total_stats["page_count"] += 1
    
                if hasattr(page, "last_modified") and page.last_modified is not None:
                    total_stats["last_modified_dates"].append(
                        page.last_modified.isoformat()
                    )
    
                if hasattr(page, "priority") and page.priority is not None:
                    try:
                        total_stats["priorities"].append(float(page.priority))
                    except (ValueError, TypeError):
                        pass
    
            # Calculate priority statistics for total stats if we have any pages
            if total_stats["priorities"]:
                total_stats["priority_stats"] = {
                    "min": min(total_stats["priorities"]),
                    "max": max(total_stats["priorities"]),
                    "avg": sum(total_stats["priorities"]) / len(total_stats["priorities"]),
                }
    
            # Calculate last modified stats for total if available
            if total_stats["last_modified_dates"]:
                total_stats["last_modified_count"] = len(total_stats["last_modified_dates"])
    
            # Convert set to list for JSON serialization
            total_stats["sitemap_types"] = list(total_stats["sitemap_types"])
    
            # Remove the raw data lists to keep response size reasonable
            del total_stats["last_modified_dates"]
            del total_stats["priorities"]
    
            # Combine total and subsitemap stats
            result = {"total": total_stats, "subsitemaps": subsitemap_stats}
    
            # Log the operation completion
            elapsed_time = time.time() - start_time
            logger.info(f"Analyzed sitemap stats for {url} in {elapsed_time:.2f} seconds")
    
            # Return as JSON
            return safe_json_dumps(result)
        except Exception as e:
            error_msg = f"Error analyzing sitemap for {url}: {str(e)}"
            logger.error(error_msg)
            logger.exception(f"Detailed exception while analyzing sitemap for {url}:")
            return safe_json_dumps({"error": error_msg})
  • The @mcp.tool decorator registers the get_sitemap_stats tool with its description.
    @mcp.tool(
        description="Get comprehensive statistics about a website's sitemap structure"
    )
  • Pydantic Field definition for the 'url' input parameter, defining the schema for the tool.
    url: str = Field(
        ..., description="The URL of the website homepage (e.g., https://example.com)"
    ),
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. While 'Get comprehensive statistics' implies a read-only operation, it doesn't specify what 'comprehensive statistics' includes, whether it requires authentication, rate limits, error conditions, or how it interacts with the sitemap (e.g., fetching vs. analyzing). This leaves significant gaps for a tool with no annotation coverage.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that directly states the tool's purpose without unnecessary words. It is appropriately sized and front-loaded, making it easy to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations and no output schema, the description is incomplete for a tool that presumably returns statistical data. It doesn't explain what 'comprehensive statistics' entails (e.g., counts, sizes, formats), how results are structured, or any behavioral traits. This leaves the agent with insufficient context to use the tool effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, clearly documenting the single 'url' parameter. The description adds no additional parameter semantics beyond what the schema provides, such as format examples or constraints. With high schema coverage, the baseline score of 3 is appropriate as the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Get comprehensive statistics') and resource ('about a website's sitemap structure'), making the purpose immediately understandable. However, it doesn't explicitly differentiate this tool from its siblings (get_sitemap_pages, get_sitemap_tree, parse_sitemap_content), which all relate to sitemaps but likely serve different purposes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus its siblings or alternatives. It doesn't mention prerequisites, constraints, or scenarios where this tool is preferred over others, leaving the agent to infer usage from the tool name alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mugoosse/sitemap-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server