Which integrations are available for this server?

Provides tools for interacting with the Hugging Face Dataset Viewer API, allowing users to search datasets, fetch metadata, retrieve rows, and perform full-text search or SQL-like filtering on dataset content.

How do I use HF Dataset MCP?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@HF Dataset MCP Search for the top 5 most downloaded text classification datasets" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

HF Dataset MCP

by cfahlgren1

Overview Schema Related Servers Score Discussions

TypeScript

Remote

HF Dataset MCP

MCP server for the Hugging Face Dataset Viewer API. Search datasets, fetch rows, filter data, and more.

Installation

npx @cfahlgren1/hf-dataset-mcp

Related MCP server: Hugging Face Hub Semantic Search MCP

Configuration

Claude Desktop

Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):

{
  "mcpServers": {
    "hf-datasets": {
      "command": "npx",
      "args": ["-y", "@cfahlgren1/hf-dataset-mcp"],
      "env": {
        "HF_TOKEN": "hf_..."
      }
    }
  }
}

Environment Variables

Variable	Description
`HF_TOKEN`	Hugging Face API token (required for private/gated datasets)
`HF_DATASETS_SERVER`	Custom Dataset Viewer API URL (default: `https://datasets-server.huggingface.co`)

Tools

`search_datasets`

Find datasets on the Hugging Face Hub by name, tag, or author.

search_datasets(search?: string, author?: string, filter?: string[], sort?: string, limit?: number)

`validate_dataset`

Check if a dataset is accessible and which viewer features are available.

validate_dataset(dataset: string)

`list_splits`

Get all available configurations and splits for a dataset.

list_splits(dataset: string)

`get_dataset_info`

Get the schema, metadata, and row counts for a dataset configuration.

get_dataset_info(dataset: string, config: string)

`get_rows`

Fetch a slice of rows from a dataset split.

get_rows(dataset: string, config: string, split: string, offset?: number, length?: number)

`search_dataset`

Full-text search within a dataset split using BM25 ranking.

search_dataset(dataset: string, config: string, split: string, query: string, offset?: number, length?: number)

`filter_rows`

Filter dataset rows using SQL-like WHERE conditions.

filter_rows(dataset: string, config: string, split: string, where: string, orderby?: string, offset?: number, length?: number)

WHERE syntax: Column names in double quotes, strings in single quotes. Supports =, <>, >, <, >=, <=, AND, OR, NOT.

Example: "label"=1 AND "text" LIKE '%hello%'

`get_dataset_size`

Get row counts and byte sizes for all configs and splits.

get_dataset_size(dataset: string)

`list_parquet_files`

Get URLs for the dataset's Parquet files for direct download or processing.

list_parquet_files(dataset: string)

`get_statistics`

Get descriptive statistics for each column in a dataset split.

get_statistics(dataset: string, config: string, split: string)

Examples

Find text classification datasets

search_datasets(filter: ["task_categories:text-classification"], sort: "downloads", limit: 10)

Get IMDB dataset info

list_splits(dataset: "stanfordnlp/imdb")
get_dataset_info(dataset: "stanfordnlp/imdb", config: "plain_text")

Fetch rows from a dataset

get_rows(dataset: "stanfordnlp/imdb", config: "plain_text", split: "train", offset: 0, length: 10)

Search for specific content

search_dataset(dataset: "stanfordnlp/imdb", config: "plain_text", split: "train", query: "amazing movie")

Filter rows

filter_rows(dataset: "stanfordnlp/imdb", config: "plain_text", split: "train", where: "\"label\"=1", length: 10)

License

MIT

Install Server

license - not found

quality

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Tools

View all tools

Appeared in Searches

Informational MCP Servers with Croissant Dataset Format Support

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/cfahlgren1/hf-dataset-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server