filter
Filter rows in Hugging Face datasets by applying SQL-like conditions, enabling precise data extraction based on specific column values or patterns.
Instructions
Filter rows in a Hugging Face dataset using SQL-like conditions
Input Schema
Name | Required | Description | Default |
---|---|---|---|
auth_token | No | Hugging Face auth token for private/gated datasets | |
config | Yes | Dataset configuration/subset name. Use get_info to list available configs | |
dataset | Yes | Hugging Face dataset identifier in the format owner/dataset | |
orderby | No | SQL-like ORDER BY clause to sort results | |
page | No | Page number for paginated results (100 rows per page) | |
split | Yes | Dataset split name. Splits partition the data for training/evaluation | |
where | Yes | SQL-like WHERE clause to filter rows |
Input Schema (JSON Schema)
{
"properties": {
"auth_token": {
"description": "Hugging Face auth token for private/gated datasets",
"optional": true,
"type": "string"
},
"config": {
"description": "Dataset configuration/subset name. Use get_info to list available configs",
"examples": [
"default",
"en",
"es"
],
"type": "string"
},
"dataset": {
"description": "Hugging Face dataset identifier in the format owner/dataset",
"examples": [
"ylecun/mnist",
"stanfordnlp/imdb"
],
"pattern": "^[^/]+/[^/]+$",
"type": "string"
},
"orderby": {
"description": "SQL-like ORDER BY clause to sort results",
"examples": [
"column ASC",
"score DESC",
"name ASC, id DESC"
],
"optional": true,
"type": "string"
},
"page": {
"default": 0,
"description": "Page number for paginated results (100 rows per page)",
"minimum": 0,
"type": "integer"
},
"split": {
"description": "Dataset split name. Splits partition the data for training/evaluation",
"examples": [
"train",
"validation",
"test"
],
"type": "string"
},
"where": {
"description": "SQL-like WHERE clause to filter rows",
"examples": [
"column = \"value\"",
"score > 0.5",
"text LIKE \"%query%\""
],
"type": "string"
}
},
"required": [
"dataset",
"config",
"split",
"where"
],
"type": "object"
}