Dataset Viewer MCP Server
by privetin
# Dataset Viewer MCP Server
An MCP server for interacting with the [Hugging Face Dataset Viewer API](https://huggingface.co/docs/dataset-viewer), providing capabilities to browse and analyze datasets hosted on the Hugging Face Hub.
## Features
### Resources
- Uses `dataset://` URI scheme for accessing Hugging Face datasets
- Supports dataset configurations and splits
- Provides paginated access to dataset contents
- Handles authentication for private datasets
- Supports searching and filtering dataset contents
- Provides dataset statistics and analysis
### Tools
The server provides the following tools:
1. **validate**
- Check if a dataset exists and is accessible
- Parameters:
- `dataset`: Dataset identifier (e.g. 'stanfordnlp/imdb')
- `auth_token` (optional): For private datasets
2. **get_info**
- Get detailed information about a dataset
- Parameters:
- `dataset`: Dataset identifier
- `auth_token` (optional): For private datasets
3. **get_rows**
- Get paginated contents of a dataset
- Parameters:
- `dataset`: Dataset identifier
- `config`: Configuration name
- `split`: Split name
- `page` (optional): Page number (0-based)
- `auth_token` (optional): For private datasets
4. **get_first_rows**
- Get first rows from a dataset split
- Parameters:
- `dataset`: Dataset identifier
- `config`: Configuration name
- `split`: Split name
- `auth_token` (optional): For private datasets
5. **get_statistics**
- Get statistics about a dataset split
- Parameters:
- `dataset`: Dataset identifier
- `config`: Configuration name
- `split`: Split name
- `auth_token` (optional): For private datasets
6. **search_dataset**
- Search for text within a dataset
- Parameters:
- `dataset`: Dataset identifier
- `config`: Configuration name
- `split`: Split name
- `query`: Text to search for
- `auth_token` (optional): For private datasets
7. **filter**
- Filter rows using SQL-like conditions
- Parameters:
- `dataset`: Dataset identifier
- `config`: Configuration name
- `split`: Split name
- `where`: SQL WHERE clause (e.g. "score > 0.5")
- `orderby` (optional): SQL ORDER BY clause
- `page` (optional): Page number (0-based)
- `auth_token` (optional): For private datasets
8. **get_parquet**
- Download entire dataset in Parquet format
- Parameters:
- `dataset`: Dataset identifier
- `auth_token` (optional): For private datasets
## Installation
### Prerequisites
- Python 3.12 or higher
- [uv](https://github.com/astral-sh/uv) - Fast Python package installer and resolver
### Setup
1. Clone the repository:
```bash
git clone https://github.com/privetin/dataset-viewer.git
cd dataset-viewer
```
2. Create a virtual environment and install:
```bash
# Create virtual environment
uv venv
# Activate virtual environment
# On Unix:
source .venv/bin/activate
# On Windows:
.venv\Scripts\activate
# Install in development mode
uv add -e .
```
## Configuration
### Environment Variables
- `HUGGINGFACE_TOKEN`: Your Hugging Face API token for accessing private datasets
### Claude Desktop Integration
Add the following to your Claude Desktop config file:
On Windows: `%APPDATA%\Claude\claude_desktop_config.json`
On MacOS: `~/Library/Application Support/Claude/claude_desktop_config.json`
```json
{
"mcpServers": {
"dataset-viewer": {
"command": "uv",
"args": [
"run",
"dataset-viewer"
]
}
}
}
```
## Usage Examples
1. Validate a dataset:
```json
{
"dataset": "stanfordnlp/imdb"
}
```
2. Get dataset information:
```json
{
"dataset": "stanfordnlp/imdb"
}
```
3. Search dataset contents:
```json
{
"dataset": "stanfordnlp/imdb",
"config": "plain_text",
"split": "train",
"query": "great movie"
}
```
4. Filter and sort rows:
```json
{
"dataset": "stanfordnlp/imdb",
"config": "plain_text",
"split": "train",
"where": "label = 'positive'",
"orderby": "text DESC",
"page": 0
}
```
5. Get dataset statistics:
```json
{
"dataset": "stanfordnlp/imdb",
"config": "plain_text",
"split": "train"
}
```
## License
MIT License - see [LICENSE](LICENSE) for details