Skip to main content
Glama
cfahlgren1

HF Dataset MCP

by cfahlgren1

get_dataset_info

Retrieve dataset schema, metadata, and row counts to understand structure and content before analysis or processing.

Instructions

Get the schema, metadata, and row counts for a dataset configuration

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
datasetYesDataset ID (e.g., 'stanfordnlp/imdb')
configYesConfiguration name (from list_splits)

Implementation Reference

  • The handler function for the 'get_dataset_info' tool that executes the logic.
    async ({ dataset, config }) => {
      const data = await fetchDatasetViewer<InfoResponse>("/info", {
        dataset,
        config,
      });
    
      return {
        content: [
          {
            type: "text" as const,
            text: JSON.stringify(data, null, 2),
          },
        ],
      };
    }
  • The interface defining the structure of the dataset info response.
    interface InfoResponse {
      dataset_info: {
        description?: string;
        citation?: string;
        homepage?: string;
        license?: string;
        features?: Record<string, unknown>;
        builder_name?: string;
        dataset_name?: string;
        config_name?: string;
        version?: Record<string, unknown>;
        splits?: Record<
          string,
          {
            num_examples: number;
            num_bytes: number;
          }
        >;
        download_size?: number;
        dataset_size?: number;
      };
      partial: boolean;
    }
  • The function that registers the 'get_dataset_info' tool with the McpServer instance.
    export function registerGetDatasetInfo(server: McpServer) {
      server.tool(
        "get_dataset_info",
        "Get the schema, metadata, and row counts for a dataset configuration",
        {
          dataset: z.string().describe("Dataset ID (e.g., 'stanfordnlp/imdb')"),
          config: z.string().describe("Configuration name (from list_splits)"),
        },
        async ({ dataset, config }) => {
          const data = await fetchDatasetViewer<InfoResponse>("/info", {
            dataset,
            config,
          });
    
          return {
            content: [
              {
                type: "text" as const,
                text: JSON.stringify(data, null, 2),
              },
            ],
          };
        }
      );
    }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/cfahlgren1/hf-dataset-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server