Skip to main content
Glama

Dataset Viewer MCP Server

by privetin

get_parquet

Export Hugging Face dataset splits as Parquet files for efficient storage and analysis. Specify the dataset identifier and optional auth token for private datasets.

Instructions

Export Hugging Face dataset split as Parquet file

Input Schema

NameRequiredDescriptionDefault
auth_tokenNoHugging Face auth token for private/gated datasets
datasetYesHugging Face dataset identifier in the format owner/dataset

Input Schema (JSON Schema)

{ "properties": { "auth_token": { "description": "Hugging Face auth token for private/gated datasets", "optional": true, "type": "string" }, "dataset": { "description": "Hugging Face dataset identifier in the format owner/dataset", "examples": [ "ylecun/mnist", "stanfordnlp/imdb" ], "pattern": "^[^/]+/[^/]+$", "type": "string" } }, "required": [ "dataset" ], "type": "object" }

Implementation Reference

  • Executes the get_parquet tool: fetches parquet data via DatasetViewerAPI and saves it to a local file, returning the file path.
    elif name == "get_parquet": dataset = arguments["dataset"] parquet_data = await DatasetViewerAPI(auth_token=auth_token).get_parquet(dataset) # Save to a temporary file with .parquet extension filename = f"{dataset.replace('/', '_')}.parquet" filepath = os.path.join(os.getcwd(), filename) with open(filepath, "wb") as f: f.write(parquet_data) return [ types.TextContent( type="text", text=f"Dataset exported to: {filepath}" ) ]
  • Helper method in DatasetViewerAPI class that retrieves the entire dataset in Parquet format from the Hugging Face datasets-server API.
    async def get_parquet(self, dataset: str) -> bytes: """Get entire dataset in Parquet format""" response = await self.client.get("/parquet", params={"dataset": dataset}) response.raise_for_status() return response.content
  • Registers the get_parquet tool with the MCP server in list_tools(), including name, description, and input schema.
    types.Tool( name="get_parquet", description="Export Hugging Face dataset split as Parquet file", inputSchema={ "type": "object", "properties": { "dataset": { "type": "string", "description": "Hugging Face dataset identifier in the format owner/dataset", "pattern": "^[^/]+/[^/]+$", "examples": ["ylecun/mnist", "stanfordnlp/imdb"] }, "auth_token": { "type": "string", "description": "Hugging Face auth token for private/gated datasets", "optional": True } }, "required": ["dataset"], } ),
  • Defines the input schema for the get_parquet tool, specifying dataset (required) and optional auth_token.
    inputSchema={ "type": "object", "properties": { "dataset": { "type": "string", "description": "Hugging Face dataset identifier in the format owner/dataset", "pattern": "^[^/]+/[^/]+$", "examples": ["ylecun/mnist", "stanfordnlp/imdb"] }, "auth_token": { "type": "string", "description": "Hugging Face auth token for private/gated datasets", "optional": True } }, "required": ["dataset"], }

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/privetin/dataset-viewer'

If you have feedback or need assistance with the MCP directory API, please join our Discord server