Skip to main content
Glama
privetin

Dataset Viewer MCP Server

by privetin

search_dataset

Search for specific text within Hugging Face datasets by specifying the dataset identifier, configuration, split, and query. Access and filter data efficiently for analysis or exploration.

Instructions

Search for text within a Hugging Face dataset

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
auth_tokenNoHugging Face auth token for private/gated datasets
configYesDataset configuration/subset name. Use get_info to list available configs
datasetYesHugging Face dataset identifier in the format owner/dataset
queryYesText to search for in the dataset
splitYesDataset split name. Splits partition the data for training/evaluation

Implementation Reference

  • Handler in @server.call_tool() that extracts arguments and calls DatasetViewerAPI.search(), returning JSON-formatted results.
    elif name == "search_dataset": dataset = arguments["dataset"] config = arguments["config"] split = arguments["split"] query = arguments["query"] search_result = await DatasetViewerAPI(auth_token=auth_token).search(dataset, config=config, split=split, query=query) return [ types.TextContent( type="text", text=json.dumps(search_result, indent=2) ) ]
  • Input schema definition for the search_dataset tool, specifying parameters like dataset, config, split, query.
    name="search_dataset", description="Search for text within a Hugging Face dataset", inputSchema={ "type": "object", "properties": { "dataset": { "type": "string", "description": "Hugging Face dataset identifier in the format owner/dataset", "pattern": "^[^/]+/[^/]+$", "examples": ["ylecun/mnist", "stanfordnlp/imdb"] }, "config": { "type": "string", "description": "Dataset configuration/subset name. Use get_info to list available configs", "examples": ["default", "en", "es"] }, "split": { "type": "string", "description": "Dataset split name. Splits partition the data for training/evaluation", "examples": ["train", "validation", "test"] }, "query": {"type": "string", "description": "Text to search for in the dataset"}, "auth_token": { "type": "string", "description": "Hugging Face auth token for private/gated datasets", "optional": True } }, "required": ["dataset", "config", "split", "query"], } ),
  • Registration of the search_dataset tool in the @server.list_tools() return list.
    types.Tool( name="search_dataset", description="Search for text within a Hugging Face dataset", inputSchema={ "type": "object", "properties": { "dataset": { "type": "string", "description": "Hugging Face dataset identifier in the format owner/dataset", "pattern": "^[^/]+/[^/]+$", "examples": ["ylecun/mnist", "stanfordnlp/imdb"] }, "config": { "type": "string", "description": "Dataset configuration/subset name. Use get_info to list available configs", "examples": ["default", "en", "es"] }, "split": { "type": "string", "description": "Dataset split name. Splits partition the data for training/evaluation", "examples": ["train", "validation", "test"] }, "query": {"type": "string", "description": "Text to search for in the dataset"}, "auth_token": { "type": "string", "description": "Hugging Face auth token for private/gated datasets", "optional": True } }, "required": ["dataset", "config", "split", "query"], } ), types.Tool(
  • Core search method in DatasetViewerAPI class that queries the Hugging Face dataset viewer /search endpoint.
    async def search(self, dataset: str, config: str, split: str, query: str) -> dict: """Search for text within a dataset split""" params = { "dataset": dataset, "config": config, "split": split, "query": query } response = await self.client.get("/search", params=params) response.raise_for_status() return response.json()

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/privetin/dataset-viewer'

If you have feedback or need assistance with the MCP directory API, please join our Discord server