Skip to main content
Glama
privetin

Dataset Viewer MCP Server

by privetin

get_rows

Retrieve paginated data rows from Hugging Face datasets by specifying dataset identifier, configuration, and split for browsing or analysis.

Instructions

Get paginated rows from a Hugging Face dataset

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
datasetYesHugging Face dataset identifier in the format owner/dataset
configYesDataset configuration/subset name. Use get_info to list available configs
splitYesDataset split name. Splits partition the data for training/evaluation
pageNoPage number (0-based), returns 100 rows per page
auth_tokenNoHugging Face auth token for private/gated datasets

Implementation Reference

  • Tool handler dispatch logic that extracts parameters, calls the DatasetViewerAPI.get_rows method, formats the result as JSON, and returns it as TextContent.
    elif name == "get_rows":
        dataset = arguments["dataset"]
        config = arguments["config"]
        split = arguments["split"]
        page = arguments.get("page", 0)
        rows = await DatasetViewerAPI(auth_token=auth_token).get_rows(dataset, config=config, split=split, page=page)
        return [
            types.TextContent(
                type="text",
                text=json.dumps(rows, indent=2)
            )
        ]
  • Core helper function implementing the logic to fetch paginated dataset rows via HTTP request to the dataset viewer API endpoint /rows.
    async def get_rows(self, dataset: str, config: str, split: str, page: int = 0) -> dict:
        """Get paginated rows of a dataset"""
        params = {
            "dataset": dataset,
            "config": config,
            "split": split,
            "offset": page * 100,  # 100 rows per page
            "length": 100
        }
        response = await self.client.get("/rows", params=params)
        response.raise_for_status()
        return response.json()
  • Tool schema definition including name, description, and input schema for parameter validation.
    types.Tool(
        name="get_rows",
        description="Get paginated rows from a Hugging Face dataset",
        inputSchema={
            "type": "object",
            "properties": {
                "dataset": {
                    "type": "string",
                    "description": "Hugging Face dataset identifier in the format owner/dataset",
                    "pattern": "^[^/]+/[^/]+$",
                    "examples": ["ylecun/mnist", "stanfordnlp/imdb"]
                },
                "config": {
                    "type": "string",
                    "description": "Dataset configuration/subset name. Use get_info to list available configs",
                    "examples": ["default", "en", "es"]
                },
                "split": {
                    "type": "string",
                    "description": "Dataset split name. Splits partition the data for training/evaluation",
                    "examples": ["train", "validation", "test"]
                },
                "page": {"type": "integer", "description": "Page number (0-based), returns 100 rows per page", "default": 0},
                "auth_token": {
                    "type": "string",
                    "description": "Hugging Face auth token for private/gated datasets",
                    "optional": True
                }
            },
            "required": ["dataset", "config", "split"],
        }
    ),
  • Registration of all tools including get_rows via the list_tools handler that returns the list of Tool objects.
    @server.list_tools()
    async def handle_list_tools() -> list[types.Tool]:
        """List available dataset tools for Hugging Face datasets"""
        return [
            types.Tool(
                name="get_info",
                description="Get detailed information about a Hugging Face dataset including description, features, splits, and statistics. Run validate first to check if the dataset exists and is accessible.",
                inputSchema={
                    "type": "object",
                    "properties": {
                        "dataset": {
                            "type": "string",
                            "description": "Hugging Face dataset identifier in the format owner/dataset",
                            "pattern": "^[^/]+/[^/]+$",
                            "examples": ["ylecun/mnist", "stanfordnlp/imdb"]
                        },
                        "auth_token": {
                            "type": "string",
                            "description": "Hugging Face auth token for private/gated datasets",
                            "optional": True
                        }
                    },
                    "required": ["dataset"],
                }
            ),
            types.Tool(
                name="get_rows",
                description="Get paginated rows from a Hugging Face dataset",
                inputSchema={
                    "type": "object",
                    "properties": {
                        "dataset": {
                            "type": "string",
                            "description": "Hugging Face dataset identifier in the format owner/dataset",
                            "pattern": "^[^/]+/[^/]+$",
                            "examples": ["ylecun/mnist", "stanfordnlp/imdb"]
                        },
                        "config": {
                            "type": "string",
                            "description": "Dataset configuration/subset name. Use get_info to list available configs",
                            "examples": ["default", "en", "es"]
                        },
                        "split": {
                            "type": "string",
                            "description": "Dataset split name. Splits partition the data for training/evaluation",
                            "examples": ["train", "validation", "test"]
                        },
                        "page": {"type": "integer", "description": "Page number (0-based), returns 100 rows per page", "default": 0},
                        "auth_token": {
                            "type": "string",
                            "description": "Hugging Face auth token for private/gated datasets",
                            "optional": True
                        }
                    },
                    "required": ["dataset", "config", "split"],
                }
            ),
            types.Tool(
                name="get_first_rows",
                description="Get first rows from a Hugging Face dataset split",
                inputSchema={
                    "type": "object",
                    "properties": {
                        "dataset": {
                            "type": "string",
                            "description": "Hugging Face dataset identifier in the format owner/dataset",
                            "pattern": "^[^/]+/[^/]+$",
                            "examples": ["ylecun/mnist", "stanfordnlp/imdb"]
                        },
                        "config": {
                            "type": "string",
                            "description": "Dataset configuration/subset name. Use get_info to list available configs",
                            "examples": ["default", "en", "es"]
                        },
                        "split": {
                            "type": "string",
                            "description": "Dataset split name. Splits partition the data for training/evaluation",
                            "examples": ["train", "validation", "test"]
                        },
                        "auth_token": {
                            "type": "string",
                            "description": "Hugging Face auth token for private/gated datasets",
                            "optional": True
                        }
                    },
                    "required": ["dataset", "config", "split"],
                }
            ),
            types.Tool(
                name="search_dataset",
                description="Search for text within a Hugging Face dataset",
                inputSchema={
                    "type": "object",
                    "properties": {
                        "dataset": {
                            "type": "string",
                            "description": "Hugging Face dataset identifier in the format owner/dataset",
                            "pattern": "^[^/]+/[^/]+$",
                            "examples": ["ylecun/mnist", "stanfordnlp/imdb"]
                        },
                        "config": {
                            "type": "string",
                            "description": "Dataset configuration/subset name. Use get_info to list available configs",
                            "examples": ["default", "en", "es"]
                        },
                        "split": {
                            "type": "string",
                            "description": "Dataset split name. Splits partition the data for training/evaluation",
                            "examples": ["train", "validation", "test"]
                        },
                        "query": {"type": "string", "description": "Text to search for in the dataset"},
                        "auth_token": {
                            "type": "string",
                            "description": "Hugging Face auth token for private/gated datasets",
                            "optional": True
                        }
                    },
                    "required": ["dataset", "config", "split", "query"],
                }
            ),
            types.Tool(
                name="filter",
                description="Filter rows in a Hugging Face dataset using SQL-like conditions",
                inputSchema={
                    "type": "object",
                    "properties": {
                        "dataset": {
                            "type": "string",
                            "description": "Hugging Face dataset identifier in the format owner/dataset",
                            "pattern": "^[^/]+/[^/]+$",
                            "examples": ["ylecun/mnist", "stanfordnlp/imdb"]
                        },
                        "config": {
                            "type": "string",
                            "description": "Dataset configuration/subset name. Use get_info to list available configs",
                            "examples": ["default", "en", "es"]
                        },
                        "split": {
                            "type": "string",
                            "description": "Dataset split name. Splits partition the data for training/evaluation",
                            "examples": ["train", "validation", "test"]
                        },
                        "where": {
                            "type": "string",
                            "description": "SQL-like WHERE clause to filter rows",
                            "examples": ["column = \"value\"", "score > 0.5", "text LIKE \"%query%\""]
                        },
                        "orderby": {
                            "type": "string",
                            "description": "SQL-like ORDER BY clause to sort results",
                            "optional": True,
                            "examples": ["column ASC", "score DESC", "name ASC, id DESC"]
                        },
                        "page": {
                            "type": "integer",
                            "description": "Page number for paginated results (100 rows per page)",
                            "default": 0,
                            "minimum": 0
                        },
                        "auth_token": {
                            "type": "string",
                            "description": "Hugging Face auth token for private/gated datasets",
                            "optional": True
                        }
                    },
                    "required": ["dataset", "config", "split", "where"],
                }
            ),
            types.Tool(
                name="get_statistics",
                description="Get statistics about a Hugging Face dataset",
                inputSchema={
                    "type": "object",
                    "properties": {
                        "dataset": {
                            "type": "string",
                            "description": "Hugging Face dataset identifier in the format owner/dataset",
                            "pattern": "^[^/]+/[^/]+$",
                            "examples": ["ylecun/mnist", "stanfordnlp/imdb"]
                        },
                        "config": {
                            "type": "string",
                            "description": "Dataset configuration/subset name. Use get_info to list available configs",
                            "examples": ["default", "en", "es"]
                        },
                        "split": {
                            "type": "string",
                            "description": "Dataset split name. Splits partition the data for training/evaluation",
                            "examples": ["train", "validation", "test"]
                        },
                        "auth_token": {
                            "type": "string",
                            "description": "Hugging Face auth token for private/gated datasets",
                            "optional": True
                        }
                    },
                    "required": ["dataset", "config", "split"],
                }
            ),
            types.Tool(
                name="get_parquet",
                description="Export Hugging Face dataset split as Parquet file",
                inputSchema={
                    "type": "object",
                    "properties": {
                        "dataset": {
                            "type": "string",
                            "description": "Hugging Face dataset identifier in the format owner/dataset",
                            "pattern": "^[^/]+/[^/]+$",
                            "examples": ["ylecun/mnist", "stanfordnlp/imdb"]
                        },
                        "auth_token": {
                            "type": "string",
                            "description": "Hugging Face auth token for private/gated datasets",
                            "optional": True
                        }
                    },
                    "required": ["dataset"],
                }
            ),
            types.Tool(
                name="validate",
                description="Check if a Hugging Face dataset exists and is accessible",
                inputSchema={
                    "type": "object",
                    "properties": {
                        "dataset": {
                            "type": "string", 
                            "description": "Hugging Face dataset identifier in the format owner/dataset",
                            "pattern": "^[^/]+/[^/]+$",
                            "examples": ["ylecun/mnist", "stanfordnlp/imdb"]
                        },
                        "auth_token": {
                            "type": "string",
                            "description": "Hugging Face auth token for private/gated datasets",
                            "optional": True
                        }
                    },
                    "required": ["dataset"],
                }
            ),
        ]
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions pagination (100 rows per page) but fails to describe critical behaviors such as rate limits, error handling for invalid inputs, whether it's read-only or has side effects, or what the output format looks like. This leaves significant gaps in understanding how the tool operates.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core functionality ('Get paginated rows from a Hugging Face dataset'). It wastes no words and directly communicates the essential action, making it highly concise and well-structured for quick comprehension.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of a 5-parameter tool with no annotations and no output schema, the description is incomplete. It lacks details on behavioral traits (e.g., rate limits, error handling), usage context relative to siblings, and output format, which are crucial for an AI agent to invoke this tool effectively in a dataset retrieval scenario.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, providing clear documentation for all 5 parameters (e.g., dataset format, config usage, split purpose, page details, auth token). The description adds no additional parameter semantics beyond what's in the schema, so it meets the baseline score of 3 without compensating or detracting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Get paginated rows') and resource ('from a Hugging Face dataset'), making the purpose immediately understandable. However, it doesn't explicitly differentiate this tool from its siblings like 'get_first_rows' or 'search_dataset', which might offer similar data retrieval with different scopes or methods.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives like 'get_first_rows' (for initial rows) or 'search_dataset' (for filtered results). It mentions pagination but doesn't clarify scenarios where paginated retrieval is preferred over other methods, leaving the agent without context for tool selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/privetin/dataset-viewer'

If you have feedback or need assistance with the MCP directory API, please join our Discord server