Sitemap MCP Server

get_sitemap_stats

Analyze a website's sitemap structure by providing the homepage URL. Retrieve detailed statistics to understand sitemap organization and content distribution.

Instructions

Get comprehensive statistics about a website's sitemap structure

Input Schema

TableJSON Schema

Name	Required	Description	Default
`url`	Yes	The URL of the website homepage (e.g., https://example.com)

Implementation Reference

src/sitemap_mcp_server/server.py:346-509 (handler)

The primary handler function for the 'get_sitemap_stats' tool. It is registered via the @mcp.tool decorator. The function fetches the sitemap tree from context, iterates over sitemaps and pages to compute comprehensive statistics including page counts, sitemap types, priority stats, and lastmod counts, then returns formatted JSON.

@mcp.tool(
    description="Get comprehensive statistics about a website's sitemap structure"
)
async def get_sitemap_stats(
    ctx: Context,
    url: str = Field(
        ..., description="The URL of the website homepage (e.g., https://example.com)"
    ),
) -> str:
    """Get statistics about a website's sitemap.

    This tool analyzes a website's sitemap and returns statistics such as:
    - Total number of pages
    - Number of subsitemaps
    - Types of sitemaps found
    - Last modification dates (min, max, average)
    - Priority statistics
    - Detailed statistics for each subsitemap
    """
    try:
        # Validate URL and normalize it if needed
        normalized_url = normalize_and_validate_url(url)
        if not normalized_url:
            return safe_json_dumps(
                {
                    "error": "Invalid URL provided. Please provide a valid HTTP or HTTPS URL.",
                    "type": "ValidationError",
                }
            )
        url = normalized_url
        # Log the operation start
        logger.info(f"Analyzing sitemap statistics for {url}")
        start_time = time.time()

        # Get the sitemap tree with caching directly from the context
        tree = ctx.request_context.lifespan_context.get_sitemap(url)

        # Collect total statistics
        total_stats = {
            "url": url,
            "page_count": 0,
            "sitemap_count": 0,
            "sitemap_types": set(),
            "last_modified_dates": [],
            "priorities": [],
        }

        # Dictionary to store stats for each subsitemap
        subsitemap_stats = []

        # Process each sitemap and collect stats
        for sitemap in tree.all_sitemaps():
            # Update total stats
            total_stats["sitemap_count"] += 1
            total_stats["sitemap_types"].add(sitemap.__class__.__name__)

            # Create individual sitemap stats
            sitemap_url = getattr(sitemap, "url", None)
            if not sitemap_url:
                continue

            # Initialize stats for this subsitemap
            current_sitemap_stats = {
                "url": sitemap_url,
                "type": sitemap.__class__.__name__,
                "page_count": 0,
                "priorities": [],
                "last_modified_dates": [],
            }

            # Count pages in this sitemap
            if hasattr(sitemap, "pages"):
                for page in sitemap.pages:
                    # Update subsitemap stats
                    current_sitemap_stats["page_count"] += 1

                    # Collect priority if available
                    if hasattr(page, "priority") and page.priority is not None:
                        try:
                            priority_value = float(page.priority)
                            current_sitemap_stats["priorities"].append(priority_value)
                        except (ValueError, TypeError):
                            pass

                    # Collect last modified date if available
                    if (
                        hasattr(page, "last_modified")
                        and page.last_modified is not None
                    ):
                        current_sitemap_stats["last_modified_dates"].append(
                            page.last_modified.isoformat()
                        )

            # Calculate priority statistics for this sitemap if we have any pages
            if current_sitemap_stats["priorities"]:
                current_sitemap_stats["priority_stats"] = {
                    "min": min(current_sitemap_stats["priorities"]),
                    "max": max(current_sitemap_stats["priorities"]),
                    "avg": sum(current_sitemap_stats["priorities"])
                    / len(current_sitemap_stats["priorities"]),
                }

            # Calculate last modified stats if available
            if current_sitemap_stats["last_modified_dates"]:
                current_sitemap_stats["last_modified_count"] = len(
                    current_sitemap_stats["last_modified_dates"]
                )

            # Remove raw data lists to keep response size reasonable
            del current_sitemap_stats["priorities"]
            del current_sitemap_stats["last_modified_dates"]

            # Add to the list of subsitemap stats
            subsitemap_stats.append(current_sitemap_stats)

        # Collect page statistics for total stats
        for page in tree.all_pages():
            total_stats["page_count"] += 1

            if hasattr(page, "last_modified") and page.last_modified is not None:
                total_stats["last_modified_dates"].append(
                    page.last_modified.isoformat()
                )

            if hasattr(page, "priority") and page.priority is not None:
                try:
                    total_stats["priorities"].append(float(page.priority))
                except (ValueError, TypeError):
                    pass

        # Calculate priority statistics for total stats if we have any pages
        if total_stats["priorities"]:
            total_stats["priority_stats"] = {
                "min": min(total_stats["priorities"]),
                "max": max(total_stats["priorities"]),
                "avg": sum(total_stats["priorities"]) / len(total_stats["priorities"]),
            }

        # Calculate last modified stats for total if available
        if total_stats["last_modified_dates"]:
            total_stats["last_modified_count"] = len(total_stats["last_modified_dates"])

        # Convert set to list for JSON serialization
        total_stats["sitemap_types"] = list(total_stats["sitemap_types"])

        # Remove the raw data lists to keep response size reasonable
        del total_stats["last_modified_dates"]
        del total_stats["priorities"]

        # Combine total and subsitemap stats
        result = {"total": total_stats, "subsitemaps": subsitemap_stats}

        # Log the operation completion
        elapsed_time = time.time() - start_time
        logger.info(f"Analyzed sitemap stats for {url} in {elapsed_time:.2f} seconds")

        # Return as JSON
        return safe_json_dumps(result)
    except Exception as e:
        error_msg = f"Error analyzing sitemap for {url}: {str(e)}"
        logger.error(error_msg)
        logger.exception(f"Detailed exception while analyzing sitemap for {url}:")
        return safe_json_dumps({"error": error_msg})

src/sitemap_mcp_server/server.py:346-348 (registration)
The @mcp.tool decorator registers the get_sitemap_stats tool with its description.
```
@mcp.tool(
    description="Get comprehensive statistics about a website's sitemap structure"
)
```
src/sitemap_mcp_server/server.py:351-353 (schema)
Pydantic Field definition for the 'url' input parameter, defining the schema for the tool.
```
url: str = Field(
    ..., description="The URL of the website homepage (e.g., https://example.com)"
),
```

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. While 'Get comprehensive statistics' implies a read-only operation, it doesn't specify what 'comprehensive statistics' includes, whether it requires authentication, rate limits, error conditions, or how it interacts with the sitemap (e.g., fetching vs. analyzing). This leaves significant gaps for a tool with no annotation coverage.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that directly states the tool's purpose without unnecessary words. It is appropriately sized and front-loaded, making it easy to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations and no output schema, the description is incomplete for a tool that presumably returns statistical data. It doesn't explain what 'comprehensive statistics' entails (e.g., counts, sizes, formats), how results are structured, or any behavioral traits. This leaves the agent with insufficient context to use the tool effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, clearly documenting the single 'url' parameter. The description adds no additional parameter semantics beyond what the schema provides, such as format examples or constraints. With high schema coverage, the baseline score of 3 is appropriate as the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Get comprehensive statistics') and resource ('about a website's sitemap structure'), making the purpose immediately understandable. However, it doesn't explicitly differentiate this tool from its siblings (get_sitemap_pages, get_sitemap_tree, parse_sitemap_content), which all relate to sitemaps but likely serve different purposes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus its siblings or alternatives. It doesn't mention prerequisites, constraints, or scenarios where this tool is preferred over others, leaving the agent to infer usage from the tool name alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Related Tools

tavily-map
@jackedelic/tavily-mcp
get_sitemap_treeC
@mugoosse/sitemap-mcp-server
tavily-mapC
@tsmndev/tavily-mcp-sse
site-explorer-best-by-internal-linksC
@ahrefs/ahrefs-mcp-server
get_site_mapB
@Cyreslab-AI/burpsuite-mcp-server
parse_sitemap_contentC
@mugoosse/sitemap-mcp-server

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mugoosse/sitemap-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server