Skip to main content
Glama

get_sitemap_stats

Analyze a website's sitemap structure by providing the homepage URL. Retrieve detailed statistics to understand sitemap organization and content distribution.

Instructions

Get comprehensive statistics about a website's sitemap structure

Input Schema

NameRequiredDescriptionDefault
urlYesThe URL of the website homepage (e.g., https://example.com)

Input Schema (JSON Schema)

{ "properties": { "url": { "description": "The URL of the website homepage (e.g., https://example.com)", "title": "Url", "type": "string" } }, "required": [ "url" ], "title": "get_sitemap_statsArguments", "type": "object" }

Implementation Reference

  • The primary handler function for the 'get_sitemap_stats' tool. It is registered via the @mcp.tool decorator. The function fetches the sitemap tree from context, iterates over sitemaps and pages to compute comprehensive statistics including page counts, sitemap types, priority stats, and lastmod counts, then returns formatted JSON.
    @mcp.tool( description="Get comprehensive statistics about a website's sitemap structure" ) async def get_sitemap_stats( ctx: Context, url: str = Field( ..., description="The URL of the website homepage (e.g., https://example.com)" ), ) -> str: """Get statistics about a website's sitemap. This tool analyzes a website's sitemap and returns statistics such as: - Total number of pages - Number of subsitemaps - Types of sitemaps found - Last modification dates (min, max, average) - Priority statistics - Detailed statistics for each subsitemap """ try: # Validate URL and normalize it if needed normalized_url = normalize_and_validate_url(url) if not normalized_url: return safe_json_dumps( { "error": "Invalid URL provided. Please provide a valid HTTP or HTTPS URL.", "type": "ValidationError", } ) url = normalized_url # Log the operation start logger.info(f"Analyzing sitemap statistics for {url}") start_time = time.time() # Get the sitemap tree with caching directly from the context tree = ctx.request_context.lifespan_context.get_sitemap(url) # Collect total statistics total_stats = { "url": url, "page_count": 0, "sitemap_count": 0, "sitemap_types": set(), "last_modified_dates": [], "priorities": [], } # Dictionary to store stats for each subsitemap subsitemap_stats = [] # Process each sitemap and collect stats for sitemap in tree.all_sitemaps(): # Update total stats total_stats["sitemap_count"] += 1 total_stats["sitemap_types"].add(sitemap.__class__.__name__) # Create individual sitemap stats sitemap_url = getattr(sitemap, "url", None) if not sitemap_url: continue # Initialize stats for this subsitemap current_sitemap_stats = { "url": sitemap_url, "type": sitemap.__class__.__name__, "page_count": 0, "priorities": [], "last_modified_dates": [], } # Count pages in this sitemap if hasattr(sitemap, "pages"): for page in sitemap.pages: # Update subsitemap stats current_sitemap_stats["page_count"] += 1 # Collect priority if available if hasattr(page, "priority") and page.priority is not None: try: priority_value = float(page.priority) current_sitemap_stats["priorities"].append(priority_value) except (ValueError, TypeError): pass # Collect last modified date if available if ( hasattr(page, "last_modified") and page.last_modified is not None ): current_sitemap_stats["last_modified_dates"].append( page.last_modified.isoformat() ) # Calculate priority statistics for this sitemap if we have any pages if current_sitemap_stats["priorities"]: current_sitemap_stats["priority_stats"] = { "min": min(current_sitemap_stats["priorities"]), "max": max(current_sitemap_stats["priorities"]), "avg": sum(current_sitemap_stats["priorities"]) / len(current_sitemap_stats["priorities"]), } # Calculate last modified stats if available if current_sitemap_stats["last_modified_dates"]: current_sitemap_stats["last_modified_count"] = len( current_sitemap_stats["last_modified_dates"] ) # Remove raw data lists to keep response size reasonable del current_sitemap_stats["priorities"] del current_sitemap_stats["last_modified_dates"] # Add to the list of subsitemap stats subsitemap_stats.append(current_sitemap_stats) # Collect page statistics for total stats for page in tree.all_pages(): total_stats["page_count"] += 1 if hasattr(page, "last_modified") and page.last_modified is not None: total_stats["last_modified_dates"].append( page.last_modified.isoformat() ) if hasattr(page, "priority") and page.priority is not None: try: total_stats["priorities"].append(float(page.priority)) except (ValueError, TypeError): pass # Calculate priority statistics for total stats if we have any pages if total_stats["priorities"]: total_stats["priority_stats"] = { "min": min(total_stats["priorities"]), "max": max(total_stats["priorities"]), "avg": sum(total_stats["priorities"]) / len(total_stats["priorities"]), } # Calculate last modified stats for total if available if total_stats["last_modified_dates"]: total_stats["last_modified_count"] = len(total_stats["last_modified_dates"]) # Convert set to list for JSON serialization total_stats["sitemap_types"] = list(total_stats["sitemap_types"]) # Remove the raw data lists to keep response size reasonable del total_stats["last_modified_dates"] del total_stats["priorities"] # Combine total and subsitemap stats result = {"total": total_stats, "subsitemaps": subsitemap_stats} # Log the operation completion elapsed_time = time.time() - start_time logger.info(f"Analyzed sitemap stats for {url} in {elapsed_time:.2f} seconds") # Return as JSON return safe_json_dumps(result) except Exception as e: error_msg = f"Error analyzing sitemap for {url}: {str(e)}" logger.error(error_msg) logger.exception(f"Detailed exception while analyzing sitemap for {url}:") return safe_json_dumps({"error": error_msg})
  • The @mcp.tool decorator registers the get_sitemap_stats tool with its description.
    @mcp.tool( description="Get comprehensive statistics about a website's sitemap structure" )
  • Pydantic Field definition for the 'url' input parameter, defining the schema for the tool.
    url: str = Field( ..., description="The URL of the website homepage (e.g., https://example.com)" ),

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mugoosse/sitemap-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server