Enables cross-platform tag translation and lookup for Pixiv's tagging system, and generates data collection scripts to crawl images from Pixiv for character dataset training pipelines.
deepghs-mcp
A Python MCP server for the DeepGHS anime AI ecosystem. Connect it to any MCP-compatible client (Claude Desktop, Cursor, etc.) to browse datasets, discover pre-built character training sets, look up tags across 18 platforms, and generate complete data pipeline scripts โ all directly from your AI assistant.
โจ Features
๐ฆ Dataset & Model Discovery
Browse all DeepGHS datasets โ Danbooru2024 (8M+ images), Sankaku, Gelbooru, Zerochan, BangumiBase, and more
Full file trees โ see exactly which tar/parquet files a dataset contains and how large they are before downloading anything
Model catalog โ find the right model for your task: CCIP, WD Tagger Enhanced, aesthetic scorer, face/head detector, anime classifier, and more
Live demos โ browse DeepGHS Spaces (interactive web apps) for testing models without code
๐ท๏ธ Cross-Platform Tag Intelligence
site_tags lookup โ 2.5M+ tags unified across 18 platforms in one query
Tag format translation โ Danbooru uses
hatsune_miku, Zerochan usesHatsune Miku, Pixiv usesๅ้ณใใฏโ this tool maps them all togetherReady-to-use Parquet queries โ get copy-paste code to filter the tag database programmatically
๐ฏ Character Dataset Finder
Pre-built LoRA datasets โ search both
deepghsandCyberHaremnamespaces for existing character image collectionsReady-to-run download commands โ get the exact
cheesechasercommand to pull what you needSmart fallback โ if no pre-built dataset exists, the tool hands off directly to the waifuc script generator
๐ค Training Pipeline Code Generation
waifuc scripts โ generate complete, annotated Python data collection pipelines for any character from any source (Danbooru, Pixiv, Gelbooru, Zerochan, Sankaku, or Auto)
cheesechaser scripts โ generate targeted download scripts to pull specific post IDs from indexed multi-TB datasets without downloading the whole archive
Format-aware โ crop sizes, bucket ranges, and export formats automatically adjusted for SD 1.5, SDXL, or Flux
๐ฆ Installation
Prerequisites
Python 3.10+
git
Quick Start
Clone the repository:
git clone https://github.com/citronlegacy/deepghs-mcp.git
cd deepghs-mcpRun the installer:
chmod +x install.sh && ./install.sh
# or without chmod:
bash install.shOr install manually:
pip install -r requirements.txt๐ Authentication
HF_TOKEN is optional for public datasets but strongly recommended โ it raises HuggingFace's API rate limit and is required for any gated or private repositories.
Get your token at huggingface.co/settings/tokens (read access is sufficient).
Without it, the server still works for all public DeepGHS datasets.
โถ๏ธ Running the Server
python deepghs_mcp.py
# or via the venv created by install.sh:
.venv/bin/python deepghs_mcp.pyโ๏ธ Configuration
Claude Desktop
Add the following to your claude_desktop_config.json:
{
"mcpServers": {
"deepghs": {
"command": "/absolute/path/to/.venv/bin/python",
"args": ["/absolute/path/to/deepghs_mcp.py"],
"env": {
"HF_TOKEN": "hf_your_token_here"
}
}
}
}Other MCP Clients
Command:
/absolute/path/to/.venv/bin/pythonArgs:
/absolute/path/to/deepghs_mcp.pyTransport: stdio
๐ก Usage Examples
Browse available datasets
"What anime datasets does DeepGHS have on HuggingFace?"
The assistant calls deepghs_list_datasets and returns all datasets sorted by download count โ Danbooru2024, Sankaku, Gelbooru WebP, BangumiBase, site_tags, and more โ with links and update dates.
Check dataset contents before downloading
"What files are in deepghs/danbooru2024? How big is it?"
The assistant calls deepghs_get_repo_info and returns the full file tree โ every .tar and .parquet file with individual and total sizes โ so you know exactly what you're committing to before you download.
Find a pre-built character dataset
"Is there already a dataset for Rem from Re:Zero I can use for LoRA training?"
The assistant calls deepghs_find_character_dataset, searches both deepghs and CyberHarem namespaces, and returns any matches with download counts and a one-liner download command.
Example response:
## Character Dataset Search: Rem
Found 2 dataset(s):
### CyberHarem/rem_rezero
Downloads: 4.2K | Likes: 31 | Updated: 2024-08-12
Download with cheesechaser:
from cheesechaser.datapool import SimpleDataPool
pool = SimpleDataPool('CyberHarem/rem_rezero')
pool.batch_download_to_directory('./dataset', max_workers=4)Look up a tag across all platforms
"What is the correct tag for 'Hatsune Miku' on Danbooru, Zerochan, and Pixiv?"
The assistant calls deepghs_search_tags and returns the format for every platform, plus a pre-filtered link to the site_tags dataset viewer and Parquet query code.
Example response:
Tag Lookup: Hatsune Miku
Danbooru: hatsune_miku
Gelbooru: hatsune_miku
Sankaku: character:hatsune_miku
Zerochan: Hatsune Miku
Pixiv: ๅ้ณใใฏ (Japanese preferred)
Yande.re: hatsune_miku
Wallhaven: hatsune mikuThis directly solves the MultiBoru tag normalization problem.
Generate a full data collection pipeline
"Generate a waifuc script to build a LoRA dataset for Surtr from Arknights, from Danbooru and Pixiv, for SDXL, safe-only."
The assistant calls deepghs_generate_waifuc_script and returns a complete annotated Python script. Here is what that script does when run:
1. Crawls Danbooru (surtr_(arknights)) + Pixiv (ในใซใ ใขใผใฏใใคใ)
2. Converts all images to RGB with white background
3. Drops monochrome, sketch, manga panels, and 3D renders
4. Filters to safe rating only
5. Deduplicates near-identical images (ๅทฎๅ / variants)
6. Drops images with no detected face
7. Splits group images โ each character becomes its own crop
8. CCIP identity filter โ AI verifies every image actually shows Surtr
9. WD14 auto-tagging โ writes .txt caption files automatically
10. Resizes and pads to 1024ร1024 (SDXL standard)
11. Exports in kohya_ss-compatible folder structureNo manual curation. No manual tagging. Drop the output folder straight into your trainer.
Generate a targeted download script
"Give me a cheesechaser script to download post IDs 1234, 5678, 9012 from deepghs/danbooru2024."
The assistant calls deepghs_generate_cheesechaser_script and returns a complete Python script โ with options for downloading by post ID list, downloading everything, or filtering from the Parquet index by tag.
Find the right model for a task
"Does DeepGHS have a model for scoring image aesthetics?"
The assistant calls deepghs_list_models with search: "aesthetic" and returns matching models with pipeline task, download counts, and direct HuggingFace links.
Try a model in the browser
"Is there a live demo for the DeepGHS face detection model?"
The assistant calls deepghs_list_spaces with search: "detection" and returns matching Spaces with direct links to the interactive demos.
๐ ๏ธ Available Tools
Tool | Description | Key Parameters |
| Browse all DeepGHS datasets with search, sort, and pagination |
|
| Browse all DeepGHS models |
|
| Browse all DeepGHS live demo Spaces |
|
| Full file tree + metadata for any dataset/model/space |
|
| Cross-platform tag lookup across 18 platforms via site_tags |
|
| Find pre-built LoRA training datasets for a character |
|
| Generate complete data collection + cleaning pipeline script |
|
| Generate targeted dataset download script |
|
๐ Tools Reference
deepghs_list_datasets
Lists all public datasets from DeepGHS, sortable and filterable by keyword.
Parameters
Parameter | Type | Required | Default | Description |
| string | โ | โ | Keyword filter, e.g. |
| string | โ |
|
|
| integer | โ |
| Results per page (max 100) |
| integer | โ |
| Pagination offset |
| string | โ |
|
|
deepghs_list_models
Lists all public models โ CCIP, WD Tagger Enhanced, aesthetic scorer, face/head/person detectors, anime classifier, furry detector, NSFW censor, style era classifier, and more.
Parameters
Parameter | Type | Required | Default | Description |
| string | โ | โ | Keyword filter, e.g. |
| string | โ |
|
|
| integer | โ |
| Results per page (max 100) |
| integer | โ |
| Pagination offset |
| string | โ |
|
|
deepghs_list_spaces
Lists all public Spaces โ live demos for face detection, head detection, CCIP character similarity, WD tagger, aesthetic scorer, reverse image search, Danbooru character lookup, and more.
Parameters
Parameter | Type | Required | Default | Description |
| string | โ | โ | Keyword filter, e.g. |
| integer | โ |
| Results per page (max 100) |
| string | โ |
|
|
deepghs_get_repo_info
Get full metadata for any dataset, model, or space โ including the complete file tree with individual file sizes. Essential before downloading a multi-TB dataset.
Parameters
Parameter | Type | Required | Default | Description |
| string | โ | โ | Full HF repo ID, e.g. |
| string | โ |
|
|
| string | โ |
|
|
Tip: Datasets containing
.tarfiles automatically get a ready-to-copy cheesechaser snippet appended.
deepghs_search_tags
Look up any tag across 18 platforms using the deepghs/site_tags dataset. Returns per-platform format guidance, a pre-filtered dataset viewer link, and Parquet query code.
Parameters
Parameter | Type | Required | Default | Description |
| string | โ | โ | Tag in any format or language, e.g. |
| string | โ |
|
|
LLM Tip: Call this before
deepghs_generate_waifuc_scriptto confirm the correct Danbooru/Pixiv tag format for a character.
deepghs_find_character_dataset
Searches deepghs and CyberHarem on HuggingFace for pre-built character datasets. CyberHarem datasets are built with the full automated pipeline: crawl โ CCIP filter โ WD14 tag โ upload.
Parameters
Parameter | Type | Required | Default | Description |
| string | โ | โ | Character name, e.g. |
| string | โ |
|
|
deepghs_generate_waifuc_script
Generates a complete, annotated Python pipeline script using waifuc.
Pipeline actions included (in order):
Action | What it does | Why it matters for LoRA |
| Convert to RGB, white background | Standardizes input format |
| Drop greyscale/sketch images | Prevents style contamination |
| Keep illustration/anime only | Drops manga panels and 3D |
| Filter by content rating | Keep dataset SFW if needed |
| Deduplicate similar images | Prevents overfitting to variants |
| Require exactly 1 face | Removes group shots and objects |
| Crop each character from group images | Maximizes usable data |
| AI identity verification | Removes wrong characters (the most important step) |
| WD14 auto-tagging | Generates |
| Resize to minimum resolution | Ensures quality floor |
| Pad to square | Standard training resolution |
Parameters
Parameter | Type | Required | Default | Description |
| string | โ | โ | Display name, e.g. |
| string | โ | auto-guessed | Danbooru tag, e.g. |
| string | โ | โ | Pixiv search query, Japanese preferred. Required if |
| list | โ |
|
|
| string | โ |
|
|
| string | โ |
|
|
| string | โ |
| Output directory |
| integer | โ | no limit | Cap total images collected |
| string | โ | โ | Pixiv refresh token (required for Pixiv source) |
Source tag formats:
Source | Tag Format | Notes |
|
| snake_case with series in parens |
|
| same as Danbooru |
|
| Japanese preferred for better results |
|
| Title Case, strict mode enabled |
|
| snake_case |
| character name | uses gchar database โ best for game characters |
deepghs_generate_cheesechaser_script
Generates a Python download script using cheesechaser to pull specific images from indexed tar datasets without downloading entire archives.
Parameters
Parameter | Type | Required | Default | Description |
| string | โ | โ | HF dataset, e.g. |
| string | โ |
| Local download directory |
| list[int] | โ | โ | Specific post IDs. If omitted, downloads all (can be very large) |
| integer | โ |
| Parallel download threads (1โ16) |
๐๏ธ Key DeepGHS Datasets
Repo ID | Description | Use Case |
| Full Danbooru archive, 8M+ images | Bulk downloads, data mining |
| Compressed WebP version | Faster downloads |
| Full Sankaku Channel dataset | Alternative tag ecosystem |
| Gelbooru compressed | Western fanart coverage |
| 2.5M+ tags, 18 platforms | Tag normalization |
| YOLO face detection labels | Train detection models |
| Character frames from anime | Character dataset bootstrapping |
๐ Recommended Workflows
Workflow A: Use a pre-built dataset
1. deepghs_find_character_dataset โ check if dataset exists
2. deepghs_get_repo_info โ inspect file sizes
3. deepghs_generate_cheesechaser_script โ get download command
4. Run the script โ trainWorkflow B: Build a new dataset from scratch
1. deepghs_search_tags โ find the correct Danbooru tag
2. deepghs_generate_waifuc_script โ generate full pipeline script
3. Run the script โ crawl, filter, tag, crop
4. Review output โ remove any remaining noise
5. Train with kohya_ssWorkflow C: Mine specific images from a large dataset
1. deepghs_get_repo_info โ inspect dataset Parquet structure
2. deepghs_search_tags โ confirm tag names
3. deepghs_generate_cheesechaser_script โ generate Parquet-filter script
4. Run the script โ downloads only matching images๐ค Notes for LLMs
Check pre-built first: Always call
deepghs_find_character_datasetbefore generating a waifuc script. CyberHarem has hundreds of ready-to-use LoRA datasets.Danbooru tag format:
character_(series)with underscores โrem_(re:zero), notRem (Re:Zero). Usedeepghs_search_tagsto confirm.File sizes: Datasets like
danbooru2024are multi-TB. Always checkdeepghs_get_repo_infobefore recommending a full download. Use cheesechaser with post IDs for targeted access.CCIP is essential: It's the most important pipeline step โ without it, 20โ40% of a character dataset will be wrong-character noise. Always include it in waifuc scripts.
Pixiv source: Requires a
pixiv_token. If the user hasn't set one up, suggest Danbooru + Gelbooru instead.Model format crop sizes: SD1.5 = 512ร512, SDXL/Flux = 1024ร1024. This controls
AlignMinSizeActionandPaddingAlignActionin generated scripts.
โ ๏ธ Known Limitations
HuggingFace rate limits: Without
HF_TOKEN, the Hub API may throttle requests on heavy usage.Gated datasets: Some datasets require explicit approval on HuggingFace before downloading. The server returns a clear error with guidance.
CyberHarem search: Niche characters may need manual browsing at huggingface.co/CyberHarem.
waifuc runtime: The generated scripts require
waifucinstalled separately (not in this server's deps). First run downloads ~500MB of CCIP + WD14 model weights.
๐ Troubleshooting
Server won't start:
Ensure Python 3.10+:
python --versionRe-run the installer:
bash install.sh
Rate limit / 429 errors:
Set
HF_TOKENin your MCP env configGet a free token: huggingface.co/settings/tokens
403 / Forbidden on a dataset:
The dataset is gated โ visit the dataset page on HuggingFace and click "Request Access"
Ensure the
HF_TOKENis from an account that has been granted access
Character dataset not found:
Try alternate spellings:
"Rem","rem_(re:zero)","rem re:zero"Browse manually: huggingface.co/CyberHarem
Generate from scratch with
deepghs_generate_waifuc_script
waifuc script fails:
Install waifuc:
pip install git+https://github.com/deepghs/waifuc.gitGPU support:
pip install "waifuc[gpu]"(much faster CCIP + tagging)First run downloads ~500MB of model weights โ this is expected
๐ค Contributing
Pull requests are welcome! If a tool is returning incorrect data, a script template is outdated, or a new DeepGHS dataset or model should be highlighted, please open an issue or PR.
Fork the repository
Create a feature branch
Make your changes
Submit a pull request
๐ License
MIT License โ see LICENSE for details.
๐ Links
๐ผ๏ธ imgutils โ image processing
๐ง MCP documentation
๐ Bug Reports
๐ก Feature Requests
Related MCP Servers
gelbooru-mcp โ Search Gelbooru, generate SD prompts from character tag data
zerochan-mcp โ Browse Zerochan's high-quality anime image board