Skip to main content
Glama
privetin

Dataset Viewer MCP Server

by privetin

get_statistics

Retrieve statistical insights from Hugging Face datasets to analyze data distribution, identify patterns, and assess dataset quality for machine learning projects.

Instructions

Get statistics about a Hugging Face dataset

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
datasetYesHugging Face dataset identifier in the format owner/dataset
configYesDataset configuration/subset name. Use get_info to list available configs
splitYesDataset split name. Splits partition the data for training/evaluation
auth_tokenNoHugging Face auth token for private/gated datasets

Implementation Reference

  • MCP tool handler in @server.call_tool() that extracts arguments (dataset, config, split), instantiates DatasetViewerAPI with auth_token, calls its get_statistics method, formats the result as JSON text content, and returns it.
    elif name == "get_statistics":
        dataset = arguments["dataset"]
        config = arguments["config"]
        split = arguments["split"]
        stats = await DatasetViewerAPI(auth_token=auth_token).get_statistics(dataset, config=config, split=split)
        return [
            types.TextContent(
                type="text",
                text=json.dumps(stats, indent=2)
            )
        ]
  • Core implementation in DatasetViewerAPI class that constructs parameters and makes asynchronous HTTP GET request to the Hugging Face dataset viewer /statistics endpoint, returning the JSON response.
    async def get_statistics(self, dataset: str, config: str, split: str) -> dict:
        """Get statistics about a dataset"""
        params = {
            "dataset": dataset,
            "config": config,
            "split": split
        }
        response = await self.client.get("/statistics", params=params)
        response.raise_for_status()
        return response.json()
  • Input schema definition for the get_statistics tool, specifying required parameters (dataset, config, split) with types, descriptions, patterns, examples, and optional auth_token.
    inputSchema={
        "type": "object",
        "properties": {
            "dataset": {
                "type": "string",
                "description": "Hugging Face dataset identifier in the format owner/dataset",
                "pattern": "^[^/]+/[^/]+$",
                "examples": ["ylecun/mnist", "stanfordnlp/imdb"]
            },
            "config": {
                "type": "string",
                "description": "Dataset configuration/subset name. Use get_info to list available configs",
                "examples": ["default", "en", "es"]
            },
            "split": {
                "type": "string",
                "description": "Dataset split name. Splits partition the data for training/evaluation",
                "examples": ["train", "validation", "test"]
            },
            "auth_token": {
                "type": "string",
                "description": "Hugging Face auth token for private/gated datasets",
                "optional": True
            }
        },
        "required": ["dataset", "config", "split"],
    }
  • Registration of the get_statistics tool in the @server.list_tools() handler, including name, description, and full input schema.
    types.Tool(
        name="get_statistics",
        description="Get statistics about a Hugging Face dataset",
        inputSchema={
            "type": "object",
            "properties": {
                "dataset": {
                    "type": "string",
                    "description": "Hugging Face dataset identifier in the format owner/dataset",
                    "pattern": "^[^/]+/[^/]+$",
                    "examples": ["ylecun/mnist", "stanfordnlp/imdb"]
                },
                "config": {
                    "type": "string",
                    "description": "Dataset configuration/subset name. Use get_info to list available configs",
                    "examples": ["default", "en", "es"]
                },
                "split": {
                    "type": "string",
                    "description": "Dataset split name. Splits partition the data for training/evaluation",
                    "examples": ["train", "validation", "test"]
                },
                "auth_token": {
                    "type": "string",
                    "description": "Hugging Face auth token for private/gated datasets",
                    "optional": True
                }
            },
            "required": ["dataset", "config", "split"],
        }
    ),

Tool Definition Quality

Score is being calculated. Check back soon.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/privetin/dataset-viewer'

If you have feedback or need assistance with the MCP directory API, please join our Discord server