HF Dataset MCP
Provides tools for interacting with the Hugging Face Dataset Viewer API, allowing users to search datasets, fetch metadata, retrieve rows, and perform full-text search or SQL-like filtering on dataset content.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@HF Dataset MCPSearch for the top 5 most downloaded text classification datasets"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
HF Dataset MCP
MCP server for the Hugging Face Dataset Viewer API. Search datasets, fetch rows, filter data, and more.
Installation
npx @cfahlgren1/hf-dataset-mcpConfiguration
Claude Desktop
Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):
{
"mcpServers": {
"hf-datasets": {
"command": "npx",
"args": ["-y", "@cfahlgren1/hf-dataset-mcp"],
"env": {
"HF_TOKEN": "hf_..."
}
}
}
}Environment Variables
Variable | Description |
| Hugging Face API token (required for private/gated datasets) |
| Custom Dataset Viewer API URL (default: |
Tools
search_datasets
Find datasets on the Hugging Face Hub by name, tag, or author.
search_datasets(search?: string, author?: string, filter?: string[], sort?: string, limit?: number)validate_dataset
Check if a dataset is accessible and which viewer features are available.
validate_dataset(dataset: string)list_splits
Get all available configurations and splits for a dataset.
list_splits(dataset: string)get_dataset_info
Get the schema, metadata, and row counts for a dataset configuration.
get_dataset_info(dataset: string, config: string)get_rows
Fetch a slice of rows from a dataset split.
get_rows(dataset: string, config: string, split: string, offset?: number, length?: number)search_dataset
Full-text search within a dataset split using BM25 ranking.
search_dataset(dataset: string, config: string, split: string, query: string, offset?: number, length?: number)filter_rows
Filter dataset rows using SQL-like WHERE conditions.
filter_rows(dataset: string, config: string, split: string, where: string, orderby?: string, offset?: number, length?: number)WHERE syntax: Column names in double quotes, strings in single quotes. Supports =, <>, >, <, >=, <=, AND, OR, NOT.
Example: "label"=1 AND "text" LIKE '%hello%'
get_dataset_size
Get row counts and byte sizes for all configs and splits.
get_dataset_size(dataset: string)list_parquet_files
Get URLs for the dataset's Parquet files for direct download or processing.
list_parquet_files(dataset: string)get_statistics
Get descriptive statistics for each column in a dataset split.
get_statistics(dataset: string, config: string, split: string)Examples
Find text classification datasets
search_datasets(filter: ["task_categories:text-classification"], sort: "downloads", limit: 10)Get IMDB dataset info
list_splits(dataset: "stanfordnlp/imdb")
get_dataset_info(dataset: "stanfordnlp/imdb", config: "plain_text")Fetch rows from a dataset
get_rows(dataset: "stanfordnlp/imdb", config: "plain_text", split: "train", offset: 0, length: 10)Search for specific content
search_dataset(dataset: "stanfordnlp/imdb", config: "plain_text", split: "train", query: "amazing movie")Filter rows
filter_rows(dataset: "stanfordnlp/imdb", config: "plain_text", split: "train", where: "\"label\"=1", length: 10)License
MIT
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/cfahlgren1/hf-dataset-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server