Skip to main content
Glama
privetin

Dataset Viewer MCP Server

by privetin

get_statistics

Retrieve statistical insights from Hugging Face datasets to analyze data distribution, identify patterns, and assess dataset quality for machine learning projects.

Instructions

Get statistics about a Hugging Face dataset

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
datasetYesHugging Face dataset identifier in the format owner/dataset
configYesDataset configuration/subset name. Use get_info to list available configs
splitYesDataset split name. Splits partition the data for training/evaluation
auth_tokenNoHugging Face auth token for private/gated datasets

Implementation Reference

  • MCP tool handler in @server.call_tool() that extracts arguments (dataset, config, split), instantiates DatasetViewerAPI with auth_token, calls its get_statistics method, formats the result as JSON text content, and returns it.
    elif name == "get_statistics":
        dataset = arguments["dataset"]
        config = arguments["config"]
        split = arguments["split"]
        stats = await DatasetViewerAPI(auth_token=auth_token).get_statistics(dataset, config=config, split=split)
        return [
            types.TextContent(
                type="text",
                text=json.dumps(stats, indent=2)
            )
        ]
  • Core implementation in DatasetViewerAPI class that constructs parameters and makes asynchronous HTTP GET request to the Hugging Face dataset viewer /statistics endpoint, returning the JSON response.
    async def get_statistics(self, dataset: str, config: str, split: str) -> dict:
        """Get statistics about a dataset"""
        params = {
            "dataset": dataset,
            "config": config,
            "split": split
        }
        response = await self.client.get("/statistics", params=params)
        response.raise_for_status()
        return response.json()
  • Input schema definition for the get_statistics tool, specifying required parameters (dataset, config, split) with types, descriptions, patterns, examples, and optional auth_token.
    inputSchema={
        "type": "object",
        "properties": {
            "dataset": {
                "type": "string",
                "description": "Hugging Face dataset identifier in the format owner/dataset",
                "pattern": "^[^/]+/[^/]+$",
                "examples": ["ylecun/mnist", "stanfordnlp/imdb"]
            },
            "config": {
                "type": "string",
                "description": "Dataset configuration/subset name. Use get_info to list available configs",
                "examples": ["default", "en", "es"]
            },
            "split": {
                "type": "string",
                "description": "Dataset split name. Splits partition the data for training/evaluation",
                "examples": ["train", "validation", "test"]
            },
            "auth_token": {
                "type": "string",
                "description": "Hugging Face auth token for private/gated datasets",
                "optional": True
            }
        },
        "required": ["dataset", "config", "split"],
    }
  • Registration of the get_statistics tool in the @server.list_tools() handler, including name, description, and full input schema.
    types.Tool(
        name="get_statistics",
        description="Get statistics about a Hugging Face dataset",
        inputSchema={
            "type": "object",
            "properties": {
                "dataset": {
                    "type": "string",
                    "description": "Hugging Face dataset identifier in the format owner/dataset",
                    "pattern": "^[^/]+/[^/]+$",
                    "examples": ["ylecun/mnist", "stanfordnlp/imdb"]
                },
                "config": {
                    "type": "string",
                    "description": "Dataset configuration/subset name. Use get_info to list available configs",
                    "examples": ["default", "en", "es"]
                },
                "split": {
                    "type": "string",
                    "description": "Dataset split name. Splits partition the data for training/evaluation",
                    "examples": ["train", "validation", "test"]
                },
                "auth_token": {
                    "type": "string",
                    "description": "Hugging Face auth token for private/gated datasets",
                    "optional": True
                }
            },
            "required": ["dataset", "config", "split"],
        }
    ),

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/privetin/dataset-viewer'

If you have feedback or need assistance with the MCP directory API, please join our Discord server