filter

filter

Filter rows in a Hugging Face dataset using SQL-like conditions

Input Schema

Name	Required	Description
`auth_token`	No	Hugging Face auth token for private/gated datasets
`config`	Yes	Dataset configuration/subset name. Use get_info to list available configs
`dataset`	Yes	Hugging Face dataset identifier in the format owner/dataset
`orderby`	No	SQL-like ORDER BY clause to sort results
`page`	No	Page number for paginated results (100 rows per page)
`split`	Yes	Dataset split name. Splits partition the data for training/evaluation
`where`	Yes	SQL-like WHERE clause to filter rows

Input Schema (JSON Schema)

{
  "properties": {
    "auth_token": {
      "description": "Hugging Face auth token for private/gated datasets",
      "optional": true,
      "type": "string"
    },
    "config": {
      "description": "Dataset configuration/subset name. Use get_info to list available configs",
      "examples": [
        "default",
        "en",
        "es"
      ],
      "type": "string"
    },
    "dataset": {
      "description": "Hugging Face dataset identifier in the format owner/dataset",
      "examples": [
        "ylecun/mnist",
        "stanfordnlp/imdb"
      ],
      "pattern": "^[^/]+/[^/]+$",
      "type": "string"
    },
    "orderby": {
      "description": "SQL-like ORDER BY clause to sort results",
      "examples": [
        "column ASC",
        "score DESC",
        "name ASC, id DESC"
      ],
      "optional": true,
      "type": "string"
    },
    "page": {
      "default": 0,
      "description": "Page number for paginated results (100 rows per page)",
      "minimum": 0,
      "type": "integer"
    },
    "split": {
      "description": "Dataset split name. Splits partition the data for training/evaluation",
      "examples": [
        "train",
        "validation",
        "test"
      ],
      "type": "string"
    },
    "where": {
      "description": "SQL-like WHERE clause to filter rows",
      "examples": [
        "column = \"value\"",
        "score > 0.5",
        "text LIKE \"%query%\""
      ],
      "type": "string"
    }
  },
  "required": [
    "dataset",
    "config",
    "split",
    "where"
  ],
  "type": "object"
}

Input Schema

Input Schema (JSON Schema)

Other Tools