Skip to main content
Glama
cfahlgren1

HF Dataset MCP

by cfahlgren1

get_statistics

Generate descriptive statistics for each column in a Hugging Face dataset split to analyze data distribution and characteristics.

Instructions

Get descriptive statistics for each column in a dataset split

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
datasetYesDataset ID (e.g., 'stanfordnlp/imdb')
configYesConfiguration name
splitYesSplit name (train, test, validation)

Implementation Reference

  • The handler function that executes the 'get_statistics' tool logic by calling fetchDatasetViewer.
    async ({ dataset, config, split }) => {
      const data = await fetchDatasetViewer<StatisticsResponse>("/statistics", {
        dataset,
        config,
        split,
      });
    
      return {
        content: [
          {
            type: "text" as const,
            text: JSON.stringify(data, null, 2),
          },
        ],
      };
    }
  • Input schema definition for the 'get_statistics' tool using Zod.
    {
      dataset: z.string().describe("Dataset ID (e.g., 'stanfordnlp/imdb')"),
      config: z.string().describe("Configuration name"),
      split: z.string().describe("Split name (train, test, validation)"),
    },
  • Registration function that registers the 'get_statistics' tool with the McpServer.
    export function registerGetStatistics(server: McpServer) {
      server.tool(
        "get_statistics",
        "Get descriptive statistics for each column in a dataset split",
        {
          dataset: z.string().describe("Dataset ID (e.g., 'stanfordnlp/imdb')"),
          config: z.string().describe("Configuration name"),
          split: z.string().describe("Split name (train, test, validation)"),
        },
        async ({ dataset, config, split }) => {
          const data = await fetchDatasetViewer<StatisticsResponse>("/statistics", {
            dataset,
            config,
            split,
          });
    
          return {
            content: [
              {
                type: "text" as const,
                text: JSON.stringify(data, null, 2),
              },
            ],
          };
        }
      );

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/cfahlgren1/hf-dataset-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server