Skip to main content
Glama
privetin

Dataset Viewer MCP Server

by privetin

get_rows

Retrieve paginated data rows from Hugging Face datasets by specifying dataset identifier, configuration, and split for browsing or analysis.

Instructions

Get paginated rows from a Hugging Face dataset

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
datasetYesHugging Face dataset identifier in the format owner/dataset
configYesDataset configuration/subset name. Use get_info to list available configs
splitYesDataset split name. Splits partition the data for training/evaluation
pageNoPage number (0-based), returns 100 rows per page
auth_tokenNoHugging Face auth token for private/gated datasets

Implementation Reference

  • Tool handler dispatch logic that extracts parameters, calls the DatasetViewerAPI.get_rows method, formats the result as JSON, and returns it as TextContent.
    elif name == "get_rows": dataset = arguments["dataset"] config = arguments["config"] split = arguments["split"] page = arguments.get("page", 0) rows = await DatasetViewerAPI(auth_token=auth_token).get_rows(dataset, config=config, split=split, page=page) return [ types.TextContent( type="text", text=json.dumps(rows, indent=2) ) ]
  • Core helper function implementing the logic to fetch paginated dataset rows via HTTP request to the dataset viewer API endpoint /rows.
    async def get_rows(self, dataset: str, config: str, split: str, page: int = 0) -> dict: """Get paginated rows of a dataset""" params = { "dataset": dataset, "config": config, "split": split, "offset": page * 100, # 100 rows per page "length": 100 } response = await self.client.get("/rows", params=params) response.raise_for_status() return response.json()
  • Tool schema definition including name, description, and input schema for parameter validation.
    types.Tool( name="get_rows", description="Get paginated rows from a Hugging Face dataset", inputSchema={ "type": "object", "properties": { "dataset": { "type": "string", "description": "Hugging Face dataset identifier in the format owner/dataset", "pattern": "^[^/]+/[^/]+$", "examples": ["ylecun/mnist", "stanfordnlp/imdb"] }, "config": { "type": "string", "description": "Dataset configuration/subset name. Use get_info to list available configs", "examples": ["default", "en", "es"] }, "split": { "type": "string", "description": "Dataset split name. Splits partition the data for training/evaluation", "examples": ["train", "validation", "test"] }, "page": {"type": "integer", "description": "Page number (0-based), returns 100 rows per page", "default": 0}, "auth_token": { "type": "string", "description": "Hugging Face auth token for private/gated datasets", "optional": True } }, "required": ["dataset", "config", "split"], } ),
  • Registration of all tools including get_rows via the list_tools handler that returns the list of Tool objects.
    @server.list_tools() async def handle_list_tools() -> list[types.Tool]: """List available dataset tools for Hugging Face datasets""" return [ types.Tool( name="get_info", description="Get detailed information about a Hugging Face dataset including description, features, splits, and statistics. Run validate first to check if the dataset exists and is accessible.", inputSchema={ "type": "object", "properties": { "dataset": { "type": "string", "description": "Hugging Face dataset identifier in the format owner/dataset", "pattern": "^[^/]+/[^/]+$", "examples": ["ylecun/mnist", "stanfordnlp/imdb"] }, "auth_token": { "type": "string", "description": "Hugging Face auth token for private/gated datasets", "optional": True } }, "required": ["dataset"], } ), types.Tool( name="get_rows", description="Get paginated rows from a Hugging Face dataset", inputSchema={ "type": "object", "properties": { "dataset": { "type": "string", "description": "Hugging Face dataset identifier in the format owner/dataset", "pattern": "^[^/]+/[^/]+$", "examples": ["ylecun/mnist", "stanfordnlp/imdb"] }, "config": { "type": "string", "description": "Dataset configuration/subset name. Use get_info to list available configs", "examples": ["default", "en", "es"] }, "split": { "type": "string", "description": "Dataset split name. Splits partition the data for training/evaluation", "examples": ["train", "validation", "test"] }, "page": {"type": "integer", "description": "Page number (0-based), returns 100 rows per page", "default": 0}, "auth_token": { "type": "string", "description": "Hugging Face auth token for private/gated datasets", "optional": True } }, "required": ["dataset", "config", "split"], } ), types.Tool( name="get_first_rows", description="Get first rows from a Hugging Face dataset split", inputSchema={ "type": "object", "properties": { "dataset": { "type": "string", "description": "Hugging Face dataset identifier in the format owner/dataset", "pattern": "^[^/]+/[^/]+$", "examples": ["ylecun/mnist", "stanfordnlp/imdb"] }, "config": { "type": "string", "description": "Dataset configuration/subset name. Use get_info to list available configs", "examples": ["default", "en", "es"] }, "split": { "type": "string", "description": "Dataset split name. Splits partition the data for training/evaluation", "examples": ["train", "validation", "test"] }, "auth_token": { "type": "string", "description": "Hugging Face auth token for private/gated datasets", "optional": True } }, "required": ["dataset", "config", "split"], } ), types.Tool( name="search_dataset", description="Search for text within a Hugging Face dataset", inputSchema={ "type": "object", "properties": { "dataset": { "type": "string", "description": "Hugging Face dataset identifier in the format owner/dataset", "pattern": "^[^/]+/[^/]+$", "examples": ["ylecun/mnist", "stanfordnlp/imdb"] }, "config": { "type": "string", "description": "Dataset configuration/subset name. Use get_info to list available configs", "examples": ["default", "en", "es"] }, "split": { "type": "string", "description": "Dataset split name. Splits partition the data for training/evaluation", "examples": ["train", "validation", "test"] }, "query": {"type": "string", "description": "Text to search for in the dataset"}, "auth_token": { "type": "string", "description": "Hugging Face auth token for private/gated datasets", "optional": True } }, "required": ["dataset", "config", "split", "query"], } ), types.Tool( name="filter", description="Filter rows in a Hugging Face dataset using SQL-like conditions", inputSchema={ "type": "object", "properties": { "dataset": { "type": "string", "description": "Hugging Face dataset identifier in the format owner/dataset", "pattern": "^[^/]+/[^/]+$", "examples": ["ylecun/mnist", "stanfordnlp/imdb"] }, "config": { "type": "string", "description": "Dataset configuration/subset name. Use get_info to list available configs", "examples": ["default", "en", "es"] }, "split": { "type": "string", "description": "Dataset split name. Splits partition the data for training/evaluation", "examples": ["train", "validation", "test"] }, "where": { "type": "string", "description": "SQL-like WHERE clause to filter rows", "examples": ["column = \"value\"", "score > 0.5", "text LIKE \"%query%\""] }, "orderby": { "type": "string", "description": "SQL-like ORDER BY clause to sort results", "optional": True, "examples": ["column ASC", "score DESC", "name ASC, id DESC"] }, "page": { "type": "integer", "description": "Page number for paginated results (100 rows per page)", "default": 0, "minimum": 0 }, "auth_token": { "type": "string", "description": "Hugging Face auth token for private/gated datasets", "optional": True } }, "required": ["dataset", "config", "split", "where"], } ), types.Tool( name="get_statistics", description="Get statistics about a Hugging Face dataset", inputSchema={ "type": "object", "properties": { "dataset": { "type": "string", "description": "Hugging Face dataset identifier in the format owner/dataset", "pattern": "^[^/]+/[^/]+$", "examples": ["ylecun/mnist", "stanfordnlp/imdb"] }, "config": { "type": "string", "description": "Dataset configuration/subset name. Use get_info to list available configs", "examples": ["default", "en", "es"] }, "split": { "type": "string", "description": "Dataset split name. Splits partition the data for training/evaluation", "examples": ["train", "validation", "test"] }, "auth_token": { "type": "string", "description": "Hugging Face auth token for private/gated datasets", "optional": True } }, "required": ["dataset", "config", "split"], } ), types.Tool( name="get_parquet", description="Export Hugging Face dataset split as Parquet file", inputSchema={ "type": "object", "properties": { "dataset": { "type": "string", "description": "Hugging Face dataset identifier in the format owner/dataset", "pattern": "^[^/]+/[^/]+$", "examples": ["ylecun/mnist", "stanfordnlp/imdb"] }, "auth_token": { "type": "string", "description": "Hugging Face auth token for private/gated datasets", "optional": True } }, "required": ["dataset"], } ), types.Tool( name="validate", description="Check if a Hugging Face dataset exists and is accessible", inputSchema={ "type": "object", "properties": { "dataset": { "type": "string", "description": "Hugging Face dataset identifier in the format owner/dataset", "pattern": "^[^/]+/[^/]+$", "examples": ["ylecun/mnist", "stanfordnlp/imdb"] }, "auth_token": { "type": "string", "description": "Hugging Face auth token for private/gated datasets", "optional": True } }, "required": ["dataset"], } ), ]

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/privetin/dataset-viewer'

If you have feedback or need assistance with the MCP directory API, please join our Discord server