Skip to main content
Glama
cfahlgren1

HF Dataset MCP

by cfahlgren1

filter_rows

Filter Hugging Face dataset rows using SQL-like WHERE conditions to extract specific data based on criteria like age, location, or other column values.

Instructions

Filter dataset rows using SQL-like WHERE conditions

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
datasetYesDataset ID (e.g., 'stanfordnlp/imdb')
configYesConfiguration name
splitYesSplit name (train, test, validation)
whereYesFilter condition (e.g., "age">30 AND "city"='Paris'). Column names in double quotes, strings in single quotes.
orderbyNoSort column and direction (e.g., "score" DESC)
offsetNoResult offset (default: 0)
lengthNoNumber of results (default: 100, max: 100)

Implementation Reference

  • The tool handler that calls fetchDatasetViewer with the provided parameters to fetch and return filtered dataset rows.
    async ({ dataset, config, split, where, orderby, offset, length }) => {
      const data = await fetchDatasetViewer<FilterResponse>("/filter", {
        dataset,
        config,
        split,
        where,
        orderby,
        offset: offset ?? 0,
        length: length ?? 100,
      });
    
      return {
        content: [
          {
            type: "text" as const,
            text: JSON.stringify(data, null, 2),
          },
        ],
      };
    }
  • Registration of the 'filter_rows' tool within the McpServer, defining input parameters using Zod.
    server.tool(
      "filter_rows",
      "Filter dataset rows using SQL-like WHERE conditions",
      {
        dataset: z.string().describe("Dataset ID (e.g., 'stanfordnlp/imdb')"),
        config: z.string().describe("Configuration name"),
        split: z.string().describe("Split name (train, test, validation)"),
        where: z
          .string()
          .describe(
            'Filter condition (e.g., "age">30 AND "city"=\'Paris\'). Column names in double quotes, strings in single quotes.'
          ),
        orderby: z
          .string()
          .optional()
          .describe('Sort column and direction (e.g., "score" DESC)'),
        offset: z
          .number()
          .int()
          .min(0)
          .optional()
          .describe("Result offset (default: 0)"),
        length: z
          .number()
          .int()
          .min(1)
          .max(100)
          .optional()
          .describe("Number of results (default: 100, max: 100)"),
      },
      async ({ dataset, config, split, where, orderby, offset, length }) => {
        const data = await fetchDatasetViewer<FilterResponse>("/filter", {
          dataset,
          config,
          split,
          where,
          orderby,
          offset: offset ?? 0,
          length: length ?? 100,
        });
    
        return {
          content: [
            {
              type: "text" as const,
              text: JSON.stringify(data, null, 2),
            },
          ],
        };
      }
    );
  • Schema definition for the response data returned by the filter_rows tool.
    interface FilterResponse {
      features: Array<{
        feature_idx: number;
        name: string;
        type: Record<string, unknown>;
      }>;
      rows: Array<{
        row_idx: number;
        row: Record<string, unknown>;
        truncated_cells: string[];
      }>;
      num_rows_total: number;
      num_rows_per_page: number;
      partial: boolean;
    }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/cfahlgren1/hf-dataset-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server