get_parquet
Export Hugging Face dataset splits as Parquet files for efficient storage and analysis. Specify the dataset identifier and optional auth token for private datasets.
Instructions
Export Hugging Face dataset split as Parquet file
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| auth_token | No | Hugging Face auth token for private/gated datasets | |
| dataset | Yes | Hugging Face dataset identifier in the format owner/dataset |
Input Schema (JSON Schema)
{
"properties": {
"auth_token": {
"description": "Hugging Face auth token for private/gated datasets",
"optional": true,
"type": "string"
},
"dataset": {
"description": "Hugging Face dataset identifier in the format owner/dataset",
"examples": [
"ylecun/mnist",
"stanfordnlp/imdb"
],
"pattern": "^[^/]+/[^/]+$",
"type": "string"
}
},
"required": [
"dataset"
],
"type": "object"
}
Implementation Reference
- src/dataset_viewer/server.py:559-574 (handler)Executes the get_parquet tool: fetches parquet data via DatasetViewerAPI and saves it to a local file, returning the file path.elif name == "get_parquet": dataset = arguments["dataset"] parquet_data = await DatasetViewerAPI(auth_token=auth_token).get_parquet(dataset) # Save to a temporary file with .parquet extension filename = f"{dataset.replace('/', '_')}.parquet" filepath = os.path.join(os.getcwd(), filename) with open(filepath, "wb") as f: f.write(parquet_data) return [ types.TextContent( type="text", text=f"Dataset exported to: {filepath}" ) ]
- src/dataset_viewer/server.py:153-157 (helper)Helper method in DatasetViewerAPI class that retrieves the entire dataset in Parquet format from the Hugging Face datasets-server API.async def get_parquet(self, dataset: str) -> bytes: """Get entire dataset in Parquet format""" response = await self.client.get("/parquet", params={"dataset": dataset}) response.raise_for_status() return response.content
- src/dataset_viewer/server.py:416-436 (registration)Registers the get_parquet tool with the MCP server in list_tools(), including name, description, and input schema.types.Tool( name="get_parquet", description="Export Hugging Face dataset split as Parquet file", inputSchema={ "type": "object", "properties": { "dataset": { "type": "string", "description": "Hugging Face dataset identifier in the format owner/dataset", "pattern": "^[^/]+/[^/]+$", "examples": ["ylecun/mnist", "stanfordnlp/imdb"] }, "auth_token": { "type": "string", "description": "Hugging Face auth token for private/gated datasets", "optional": True } }, "required": ["dataset"], } ),
- src/dataset_viewer/server.py:419-435 (schema)Defines the input schema for the get_parquet tool, specifying dataset (required) and optional auth_token.inputSchema={ "type": "object", "properties": { "dataset": { "type": "string", "description": "Hugging Face dataset identifier in the format owner/dataset", "pattern": "^[^/]+/[^/]+$", "examples": ["ylecun/mnist", "stanfordnlp/imdb"] }, "auth_token": { "type": "string", "description": "Hugging Face auth token for private/gated datasets", "optional": True } }, "required": ["dataset"], }