Skip to main content
Glama
privetin

Dataset Viewer MCP Server

by privetin

get_rows

Retrieve paginated data rows from Hugging Face datasets by specifying dataset identifier, configuration, and split for browsing or analysis.

Instructions

Get paginated rows from a Hugging Face dataset

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
datasetYesHugging Face dataset identifier in the format owner/dataset
configYesDataset configuration/subset name. Use get_info to list available configs
splitYesDataset split name. Splits partition the data for training/evaluation
pageNoPage number (0-based), returns 100 rows per page
auth_tokenNoHugging Face auth token for private/gated datasets

Implementation Reference

  • Tool handler dispatch logic that extracts parameters, calls the DatasetViewerAPI.get_rows method, formats the result as JSON, and returns it as TextContent.
    elif name == "get_rows":
        dataset = arguments["dataset"]
        config = arguments["config"]
        split = arguments["split"]
        page = arguments.get("page", 0)
        rows = await DatasetViewerAPI(auth_token=auth_token).get_rows(dataset, config=config, split=split, page=page)
        return [
            types.TextContent(
                type="text",
                text=json.dumps(rows, indent=2)
            )
        ]
  • Core helper function implementing the logic to fetch paginated dataset rows via HTTP request to the dataset viewer API endpoint /rows.
    async def get_rows(self, dataset: str, config: str, split: str, page: int = 0) -> dict:
        """Get paginated rows of a dataset"""
        params = {
            "dataset": dataset,
            "config": config,
            "split": split,
            "offset": page * 100,  # 100 rows per page
            "length": 100
        }
        response = await self.client.get("/rows", params=params)
        response.raise_for_status()
        return response.json()
  • Tool schema definition including name, description, and input schema for parameter validation.
    types.Tool(
        name="get_rows",
        description="Get paginated rows from a Hugging Face dataset",
        inputSchema={
            "type": "object",
            "properties": {
                "dataset": {
                    "type": "string",
                    "description": "Hugging Face dataset identifier in the format owner/dataset",
                    "pattern": "^[^/]+/[^/]+$",
                    "examples": ["ylecun/mnist", "stanfordnlp/imdb"]
                },
                "config": {
                    "type": "string",
                    "description": "Dataset configuration/subset name. Use get_info to list available configs",
                    "examples": ["default", "en", "es"]
                },
                "split": {
                    "type": "string",
                    "description": "Dataset split name. Splits partition the data for training/evaluation",
                    "examples": ["train", "validation", "test"]
                },
                "page": {"type": "integer", "description": "Page number (0-based), returns 100 rows per page", "default": 0},
                "auth_token": {
                    "type": "string",
                    "description": "Hugging Face auth token for private/gated datasets",
                    "optional": True
                }
            },
            "required": ["dataset", "config", "split"],
        }
    ),
  • Registration of all tools including get_rows via the list_tools handler that returns the list of Tool objects.
    @server.list_tools()
    async def handle_list_tools() -> list[types.Tool]:
        """List available dataset tools for Hugging Face datasets"""
        return [
            types.Tool(
                name="get_info",
                description="Get detailed information about a Hugging Face dataset including description, features, splits, and statistics. Run validate first to check if the dataset exists and is accessible.",
                inputSchema={
                    "type": "object",
                    "properties": {
                        "dataset": {
                            "type": "string",
                            "description": "Hugging Face dataset identifier in the format owner/dataset",
                            "pattern": "^[^/]+/[^/]+$",
                            "examples": ["ylecun/mnist", "stanfordnlp/imdb"]
                        },
                        "auth_token": {
                            "type": "string",
                            "description": "Hugging Face auth token for private/gated datasets",
                            "optional": True
                        }
                    },
                    "required": ["dataset"],
                }
            ),
            types.Tool(
                name="get_rows",
                description="Get paginated rows from a Hugging Face dataset",
                inputSchema={
                    "type": "object",
                    "properties": {
                        "dataset": {
                            "type": "string",
                            "description": "Hugging Face dataset identifier in the format owner/dataset",
                            "pattern": "^[^/]+/[^/]+$",
                            "examples": ["ylecun/mnist", "stanfordnlp/imdb"]
                        },
                        "config": {
                            "type": "string",
                            "description": "Dataset configuration/subset name. Use get_info to list available configs",
                            "examples": ["default", "en", "es"]
                        },
                        "split": {
                            "type": "string",
                            "description": "Dataset split name. Splits partition the data for training/evaluation",
                            "examples": ["train", "validation", "test"]
                        },
                        "page": {"type": "integer", "description": "Page number (0-based), returns 100 rows per page", "default": 0},
                        "auth_token": {
                            "type": "string",
                            "description": "Hugging Face auth token for private/gated datasets",
                            "optional": True
                        }
                    },
                    "required": ["dataset", "config", "split"],
                }
            ),
            types.Tool(
                name="get_first_rows",
                description="Get first rows from a Hugging Face dataset split",
                inputSchema={
                    "type": "object",
                    "properties": {
                        "dataset": {
                            "type": "string",
                            "description": "Hugging Face dataset identifier in the format owner/dataset",
                            "pattern": "^[^/]+/[^/]+$",
                            "examples": ["ylecun/mnist", "stanfordnlp/imdb"]
                        },
                        "config": {
                            "type": "string",
                            "description": "Dataset configuration/subset name. Use get_info to list available configs",
                            "examples": ["default", "en", "es"]
                        },
                        "split": {
                            "type": "string",
                            "description": "Dataset split name. Splits partition the data for training/evaluation",
                            "examples": ["train", "validation", "test"]
                        },
                        "auth_token": {
                            "type": "string",
                            "description": "Hugging Face auth token for private/gated datasets",
                            "optional": True
                        }
                    },
                    "required": ["dataset", "config", "split"],
                }
            ),
            types.Tool(
                name="search_dataset",
                description="Search for text within a Hugging Face dataset",
                inputSchema={
                    "type": "object",
                    "properties": {
                        "dataset": {
                            "type": "string",
                            "description": "Hugging Face dataset identifier in the format owner/dataset",
                            "pattern": "^[^/]+/[^/]+$",
                            "examples": ["ylecun/mnist", "stanfordnlp/imdb"]
                        },
                        "config": {
                            "type": "string",
                            "description": "Dataset configuration/subset name. Use get_info to list available configs",
                            "examples": ["default", "en", "es"]
                        },
                        "split": {
                            "type": "string",
                            "description": "Dataset split name. Splits partition the data for training/evaluation",
                            "examples": ["train", "validation", "test"]
                        },
                        "query": {"type": "string", "description": "Text to search for in the dataset"},
                        "auth_token": {
                            "type": "string",
                            "description": "Hugging Face auth token for private/gated datasets",
                            "optional": True
                        }
                    },
                    "required": ["dataset", "config", "split", "query"],
                }
            ),
            types.Tool(
                name="filter",
                description="Filter rows in a Hugging Face dataset using SQL-like conditions",
                inputSchema={
                    "type": "object",
                    "properties": {
                        "dataset": {
                            "type": "string",
                            "description": "Hugging Face dataset identifier in the format owner/dataset",
                            "pattern": "^[^/]+/[^/]+$",
                            "examples": ["ylecun/mnist", "stanfordnlp/imdb"]
                        },
                        "config": {
                            "type": "string",
                            "description": "Dataset configuration/subset name. Use get_info to list available configs",
                            "examples": ["default", "en", "es"]
                        },
                        "split": {
                            "type": "string",
                            "description": "Dataset split name. Splits partition the data for training/evaluation",
                            "examples": ["train", "validation", "test"]
                        },
                        "where": {
                            "type": "string",
                            "description": "SQL-like WHERE clause to filter rows",
                            "examples": ["column = \"value\"", "score > 0.5", "text LIKE \"%query%\""]
                        },
                        "orderby": {
                            "type": "string",
                            "description": "SQL-like ORDER BY clause to sort results",
                            "optional": True,
                            "examples": ["column ASC", "score DESC", "name ASC, id DESC"]
                        },
                        "page": {
                            "type": "integer",
                            "description": "Page number for paginated results (100 rows per page)",
                            "default": 0,
                            "minimum": 0
                        },
                        "auth_token": {
                            "type": "string",
                            "description": "Hugging Face auth token for private/gated datasets",
                            "optional": True
                        }
                    },
                    "required": ["dataset", "config", "split", "where"],
                }
            ),
            types.Tool(
                name="get_statistics",
                description="Get statistics about a Hugging Face dataset",
                inputSchema={
                    "type": "object",
                    "properties": {
                        "dataset": {
                            "type": "string",
                            "description": "Hugging Face dataset identifier in the format owner/dataset",
                            "pattern": "^[^/]+/[^/]+$",
                            "examples": ["ylecun/mnist", "stanfordnlp/imdb"]
                        },
                        "config": {
                            "type": "string",
                            "description": "Dataset configuration/subset name. Use get_info to list available configs",
                            "examples": ["default", "en", "es"]
                        },
                        "split": {
                            "type": "string",
                            "description": "Dataset split name. Splits partition the data for training/evaluation",
                            "examples": ["train", "validation", "test"]
                        },
                        "auth_token": {
                            "type": "string",
                            "description": "Hugging Face auth token for private/gated datasets",
                            "optional": True
                        }
                    },
                    "required": ["dataset", "config", "split"],
                }
            ),
            types.Tool(
                name="get_parquet",
                description="Export Hugging Face dataset split as Parquet file",
                inputSchema={
                    "type": "object",
                    "properties": {
                        "dataset": {
                            "type": "string",
                            "description": "Hugging Face dataset identifier in the format owner/dataset",
                            "pattern": "^[^/]+/[^/]+$",
                            "examples": ["ylecun/mnist", "stanfordnlp/imdb"]
                        },
                        "auth_token": {
                            "type": "string",
                            "description": "Hugging Face auth token for private/gated datasets",
                            "optional": True
                        }
                    },
                    "required": ["dataset"],
                }
            ),
            types.Tool(
                name="validate",
                description="Check if a Hugging Face dataset exists and is accessible",
                inputSchema={
                    "type": "object",
                    "properties": {
                        "dataset": {
                            "type": "string", 
                            "description": "Hugging Face dataset identifier in the format owner/dataset",
                            "pattern": "^[^/]+/[^/]+$",
                            "examples": ["ylecun/mnist", "stanfordnlp/imdb"]
                        },
                        "auth_token": {
                            "type": "string",
                            "description": "Hugging Face auth token for private/gated datasets",
                            "optional": True
                        }
                    },
                    "required": ["dataset"],
                }
            ),
        ]

Tool Definition Quality

Score is being calculated. Check back soon.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/privetin/dataset-viewer'

If you have feedback or need assistance with the MCP directory API, please join our Discord server