get_dataset_size
Retrieve row counts and byte sizes for all configurations and splits of a Hugging Face dataset to analyze its structure and storage requirements.
Instructions
Get row counts and byte sizes for all configs and splits
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| dataset | Yes | Dataset ID (e.g., 'stanfordnlp/imdb') |
Implementation Reference
- src/tools/get-dataset-size.ts:43-56 (handler)The handler function that executes the "get_dataset_size" tool logic.
async ({ dataset }) => { const data = await fetchDatasetViewer<SizeResponse>("/size", { dataset, }); return { content: [ { type: "text" as const, text: JSON.stringify(data, null, 2), }, ], }; } - src/tools/get-dataset-size.ts:5-34 (schema)Interface defining the structure of the dataset size response.
interface SizeResponse { size: { dataset: { num_bytes_original_files?: number; num_bytes_parquet_files?: number; num_bytes_memory?: number; num_rows?: number; }; configs: Array<{ dataset: string; config: string; num_bytes_original_files?: number; num_bytes_parquet_files?: number; num_bytes_memory?: number; num_rows?: number; }>; splits: Array<{ dataset: string; config: string; split: string; num_bytes_original_files?: number; num_bytes_parquet_files?: number; num_bytes_memory?: number; num_rows?: number; }>; }; pending: unknown[]; failed: unknown[]; partial: boolean; } - src/tools/get-dataset-size.ts:36-58 (registration)Registration function for the "get_dataset_size" tool.
export function registerGetDatasetSize(server: McpServer) { server.tool( "get_dataset_size", "Get row counts and byte sizes for all configs and splits", { dataset: z.string().describe("Dataset ID (e.g., 'stanfordnlp/imdb')"), }, async ({ dataset }) => { const data = await fetchDatasetViewer<SizeResponse>("/size", { dataset, }); return { content: [ { type: "text" as const, text: JSON.stringify(data, null, 2), }, ], }; } ); }