hf_get_dataset_parquet
Retrieve auto-converted parquet files for a specific dataset, subset, or split from the Hugging Face Hub. Access structured data files efficiently for machine learning workflows.
Instructions
Get the list of auto-converted parquet files for a dataset. Can specify subset (config) and split to get specific files.
Input Schema
Name | Required | Description | Default |
---|---|---|---|
n | No | Optional shard number to get the nth parquet file | |
repo_id | Yes | Dataset repository ID | |
split | No | Optional dataset split (train, test, validation, etc.) | |
subset | No | Optional dataset subset/config name |
Input Schema (JSON Schema)
{
"properties": {
"n": {
"description": "Optional shard number to get the nth parquet file",
"type": "number"
},
"repo_id": {
"description": "Dataset repository ID",
"type": "string"
},
"split": {
"description": "Optional dataset split (train, test, validation, etc.)",
"type": "string"
},
"subset": {
"description": "Optional dataset subset/config name",
"type": "string"
}
},
"required": [
"repo_id"
],
"type": "object"
}