Skip to main content
Glama
cfahlgren1

HF Dataset MCP

by cfahlgren1

get_rows

Fetch specific rows from Hugging Face dataset splits to analyze data samples or extract subsets for processing.

Instructions

Fetch a slice of rows from a dataset split

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
datasetYesDataset ID (e.g., 'stanfordnlp/imdb')
configYesConfiguration name (from list_splits)
splitYesSplit name (train, test, validation)
offsetNoRow index to start from (default: 0)
lengthNoNumber of rows to fetch (default: 100, max: 100)

Implementation Reference

  • The handler function for the get_rows tool, which executes the request to fetch rows.
    async ({ dataset, config, split, offset, length }) => {
      const data = await fetchDatasetViewer<RowsResponse>("/rows", {
        dataset,
        config,
        split,
        offset: offset ?? 0,
        length: length ?? 100,
      });
    
      return {
        content: [
          {
            type: "text" as const,
            text: JSON.stringify(data, null, 2),
          },
        ],
      };
    }
  • Input validation schema for the get_rows tool.
    {
      dataset: z.string().describe("Dataset ID (e.g., 'stanfordnlp/imdb')"),
      config: z.string().describe("Configuration name (from list_splits)"),
      split: z.string().describe("Split name (train, test, validation)"),
      offset: z
        .number()
        .int()
        .min(0)
        .optional()
        .describe("Row index to start from (default: 0)"),
      length: z
        .number()
        .int()
        .min(1)
        .max(100)
        .optional()
        .describe("Number of rows to fetch (default: 100, max: 100)"),
    },
  • Registration function for the get_rows tool.
    export function registerGetRows(server: McpServer) {
      server.tool(
        "get_rows",
        "Fetch a slice of rows from a dataset split",
        {
          dataset: z.string().describe("Dataset ID (e.g., 'stanfordnlp/imdb')"),
          config: z.string().describe("Configuration name (from list_splits)"),
          split: z.string().describe("Split name (train, test, validation)"),
          offset: z
            .number()
            .int()
            .min(0)
            .optional()
            .describe("Row index to start from (default: 0)"),
          length: z
            .number()
            .int()
            .min(1)
            .max(100)
            .optional()
            .describe("Number of rows to fetch (default: 100, max: 100)"),
        },
        async ({ dataset, config, split, offset, length }) => {
          const data = await fetchDatasetViewer<RowsResponse>("/rows", {
            dataset,
            config,
            split,
            offset: offset ?? 0,
            length: length ?? 100,
          });
    
          return {
            content: [
              {
                type: "text" as const,
                text: JSON.stringify(data, null, 2),
              },
            ],
          };
        }
      );
    }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/cfahlgren1/hf-dataset-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server