get_statistics
Generate descriptive statistics for each column in a Hugging Face dataset split to analyze data distribution and characteristics.
Instructions
Get descriptive statistics for each column in a dataset split
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| dataset | Yes | Dataset ID (e.g., 'stanfordnlp/imdb') | |
| config | Yes | Configuration name | |
| split | Yes | Split name (train, test, validation) |
Implementation Reference
- src/tools/get-statistics.ts:24-39 (handler)The handler function that executes the 'get_statistics' tool logic by calling fetchDatasetViewer.
async ({ dataset, config, split }) => { const data = await fetchDatasetViewer<StatisticsResponse>("/statistics", { dataset, config, split, }); return { content: [ { type: "text" as const, text: JSON.stringify(data, null, 2), }, ], }; } - src/tools/get-statistics.ts:19-23 (schema)Input schema definition for the 'get_statistics' tool using Zod.
{ dataset: z.string().describe("Dataset ID (e.g., 'stanfordnlp/imdb')"), config: z.string().describe("Configuration name"), split: z.string().describe("Split name (train, test, validation)"), }, - src/tools/get-statistics.ts:15-40 (registration)Registration function that registers the 'get_statistics' tool with the McpServer.
export function registerGetStatistics(server: McpServer) { server.tool( "get_statistics", "Get descriptive statistics for each column in a dataset split", { dataset: z.string().describe("Dataset ID (e.g., 'stanfordnlp/imdb')"), config: z.string().describe("Configuration name"), split: z.string().describe("Split name (train, test, validation)"), }, async ({ dataset, config, split }) => { const data = await fetchDatasetViewer<StatisticsResponse>("/statistics", { dataset, config, split, }); return { content: [ { type: "text" as const, text: JSON.stringify(data, null, 2), }, ], }; } );