Skip to main content
Glama
cfahlgren1

HF Dataset MCP

by cfahlgren1

get_rows

Fetch specific rows from Hugging Face dataset splits to analyze data samples or extract subsets for processing.

Instructions

Fetch a slice of rows from a dataset split

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
datasetYesDataset ID (e.g., 'stanfordnlp/imdb')
configYesConfiguration name (from list_splits)
splitYesSplit name (train, test, validation)
offsetNoRow index to start from (default: 0)
lengthNoNumber of rows to fetch (default: 100, max: 100)

Implementation Reference

  • The handler function for the get_rows tool, which executes the request to fetch rows.
    async ({ dataset, config, split, offset, length }) => {
      const data = await fetchDatasetViewer<RowsResponse>("/rows", {
        dataset,
        config,
        split,
        offset: offset ?? 0,
        length: length ?? 100,
      });
    
      return {
        content: [
          {
            type: "text" as const,
            text: JSON.stringify(data, null, 2),
          },
        ],
      };
    }
  • Input validation schema for the get_rows tool.
    {
      dataset: z.string().describe("Dataset ID (e.g., 'stanfordnlp/imdb')"),
      config: z.string().describe("Configuration name (from list_splits)"),
      split: z.string().describe("Split name (train, test, validation)"),
      offset: z
        .number()
        .int()
        .min(0)
        .optional()
        .describe("Row index to start from (default: 0)"),
      length: z
        .number()
        .int()
        .min(1)
        .max(100)
        .optional()
        .describe("Number of rows to fetch (default: 100, max: 100)"),
    },
  • Registration function for the get_rows tool.
    export function registerGetRows(server: McpServer) {
      server.tool(
        "get_rows",
        "Fetch a slice of rows from a dataset split",
        {
          dataset: z.string().describe("Dataset ID (e.g., 'stanfordnlp/imdb')"),
          config: z.string().describe("Configuration name (from list_splits)"),
          split: z.string().describe("Split name (train, test, validation)"),
          offset: z
            .number()
            .int()
            .min(0)
            .optional()
            .describe("Row index to start from (default: 0)"),
          length: z
            .number()
            .int()
            .min(1)
            .max(100)
            .optional()
            .describe("Number of rows to fetch (default: 100, max: 100)"),
        },
        async ({ dataset, config, split, offset, length }) => {
          const data = await fetchDatasetViewer<RowsResponse>("/rows", {
            dataset,
            config,
            split,
            offset: offset ?? 0,
            length: length ?? 100,
          });
    
          return {
            content: [
              {
                type: "text" as const,
                text: JSON.stringify(data, null, 2),
              },
            ],
          };
        }
      );
    }
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. Mentions 'slice' implying pagination, but fails to disclose: return format/structure, behavior when offset exceeds dataset bounds, performance implications of large offsets, or whether operation is read-only.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single 9-word sentence is efficiently structured and front-loaded with the verb. No filler content. However, extreme brevity results in under-specification for a 5-parameter tool with pagination logic.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 100% schema coverage, input parameters are adequately documented. However, no output schema exists, and description fails to specify row format, return structure, or error conditions. Relationship to prerequisite tools (list_splits) is unexplained.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, providing complete documentation for all 5 parameters. Description adds minimal semantic value beyond the schema, merely echoing 'slice' which aligns with offset/length parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clear verb 'Fetch' and resource 'rows from a dataset split'. Specifies 'slice' implying contiguous pagination, which subtly distinguishes from sibling 'filter_rows'. However, lacks explicit contrast with alternatives like 'search_dataset' or 'filter_rows'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides no guidance on when to use this versus siblings (filter_rows, search_dataset) or prerequisites (e.g., that config/split values likely come from list_splits). No error handling or typical use case patterns mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/cfahlgren1/hf-dataset-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server