get_dataset_info
Retrieve dataset schema, metadata, and row counts to understand structure and content before analysis or processing.
Instructions
Get the schema, metadata, and row counts for a dataset configuration
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| dataset | Yes | Dataset ID (e.g., 'stanfordnlp/imdb') | |
| config | Yes | Configuration name (from list_splits) |
Implementation Reference
- src/tools/get-dataset-info.ts:37-51 (handler)The handler function for the 'get_dataset_info' tool that executes the logic.
async ({ dataset, config }) => { const data = await fetchDatasetViewer<InfoResponse>("/info", { dataset, config, }); return { content: [ { type: "text" as const, text: JSON.stringify(data, null, 2), }, ], }; } - src/tools/get-dataset-info.ts:5-27 (schema)The interface defining the structure of the dataset info response.
interface InfoResponse { dataset_info: { description?: string; citation?: string; homepage?: string; license?: string; features?: Record<string, unknown>; builder_name?: string; dataset_name?: string; config_name?: string; version?: Record<string, unknown>; splits?: Record< string, { num_examples: number; num_bytes: number; } >; download_size?: number; dataset_size?: number; }; partial: boolean; } - src/tools/get-dataset-info.ts:29-53 (registration)The function that registers the 'get_dataset_info' tool with the McpServer instance.
export function registerGetDatasetInfo(server: McpServer) { server.tool( "get_dataset_info", "Get the schema, metadata, and row counts for a dataset configuration", { dataset: z.string().describe("Dataset ID (e.g., 'stanfordnlp/imdb')"), config: z.string().describe("Configuration name (from list_splits)"), }, async ({ dataset, config }) => { const data = await fetchDatasetViewer<InfoResponse>("/info", { dataset, config, }); return { content: [ { type: "text" as const, text: JSON.stringify(data, null, 2), }, ], }; } ); }