Skip to main content
Glama
privetin

Dataset Viewer MCP Server

by privetin

get_parquet

Export Hugging Face dataset splits as Parquet files for data analysis and processing workflows.

Instructions

Export Hugging Face dataset split as Parquet file

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
datasetYesHugging Face dataset identifier in the format owner/dataset
auth_tokenNoHugging Face auth token for private/gated datasets

Implementation Reference

  • Executes the get_parquet tool: fetches parquet data from HF Dataset Viewer API using DatasetViewerAPI and saves it to a local .parquet file, returning the file path.
    elif name == "get_parquet": dataset = arguments["dataset"] parquet_data = await DatasetViewerAPI(auth_token=auth_token).get_parquet(dataset) # Save to a temporary file with .parquet extension filename = f"{dataset.replace('/', '_')}.parquet" filepath = os.path.join(os.getcwd(), filename) with open(filepath, "wb") as f: f.write(parquet_data) return [ types.TextContent( type="text", text=f"Dataset exported to: {filepath}" ) ]
  • Registers the get_parquet tool in the MCP server's list_tools() handler, defining its name, description, and input schema.
    types.Tool( name="get_parquet", description="Export Hugging Face dataset split as Parquet file", inputSchema={ "type": "object", "properties": { "dataset": { "type": "string", "description": "Hugging Face dataset identifier in the format owner/dataset", "pattern": "^[^/]+/[^/]+$", "examples": ["ylecun/mnist", "stanfordnlp/imdb"] }, "auth_token": { "type": "string", "description": "Hugging Face auth token for private/gated datasets", "optional": True } }, "required": ["dataset"], } ),
  • DatasetViewerAPI helper method that performs the HTTP request to retrieve the full dataset as parquet bytes from the HF datasets-server.
    async def get_parquet(self, dataset: str) -> bytes: """Get entire dataset in Parquet format""" response = await self.client.get("/parquet", params={"dataset": dataset}) response.raise_for_status() return response.content

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/privetin/dataset-viewer'

If you have feedback or need assistance with the MCP directory API, please join our Discord server