get_first_rows
Retrieve initial rows from a Hugging Face dataset split to preview data structure and content for analysis or validation.
Instructions
Get first rows from a Hugging Face dataset split
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| dataset | Yes | Hugging Face dataset identifier in the format owner/dataset | |
| config | Yes | Dataset configuration/subset name. Use get_info to list available configs | |
| split | Yes | Dataset split name. Splits partition the data for training/evaluation | |
| auth_token | No | Hugging Face auth token for private/gated datasets |
Implementation Reference
- src/dataset_viewer/server.py:507-517 (handler)MCP tool handler execution logic for 'get_first_rows': extracts arguments, instantiates DatasetViewerAPI, calls get_first_rows method, and returns JSON-formatted result as TextContent.elif name == "get_first_rows": dataset = arguments["dataset"] config = arguments["config"] split = arguments["split"] first_rows = await DatasetViewerAPI(auth_token=auth_token).get_first_rows(dataset, config=config, split=split) return [ types.TextContent( type="text", text=json.dumps(first_rows, indent=2) ) ]
- src/dataset_viewer/server.py:274-304 (registration)Tool registration in @server.list_tools(), defining name, description, and input schema for 'get_first_rows'.types.Tool( name="get_first_rows", description="Get first rows from a Hugging Face dataset split", inputSchema={ "type": "object", "properties": { "dataset": { "type": "string", "description": "Hugging Face dataset identifier in the format owner/dataset", "pattern": "^[^/]+/[^/]+$", "examples": ["ylecun/mnist", "stanfordnlp/imdb"] }, "config": { "type": "string", "description": "Dataset configuration/subset name. Use get_info to list available configs", "examples": ["default", "en", "es"] }, "split": { "type": "string", "description": "Dataset split name. Splits partition the data for training/evaluation", "examples": ["train", "validation", "test"] }, "auth_token": { "type": "string", "description": "Hugging Face auth token for private/gated datasets", "optional": True } }, "required": ["dataset", "config", "split"], } ),
- src/dataset_viewer/server.py:93-102 (helper)Core helper function in DatasetViewerAPI class that makes HTTP GET request to '/first-rows' endpoint of Hugging Face datasets-server to retrieve first rows of specified dataset/config/split.async def get_first_rows(self, dataset: str, config: str, split: str) -> dict: """Get first few rows of a dataset split""" params = { "dataset": dataset, "config": config, "split": split } response = await self.client.get("/first-rows", params=params) response.raise_for_status() return response.json()