Skip to main content
Glama

deepghs_generate_waifuc_script

Generate Python scripts to crawl, clean, and prepare anime character image datasets for LoRA training by filtering, tagging, and cropping images to model-specific resolutions.

Instructions

Generate a ready-to-run waifuc Python script to crawl and clean anime character images for LoRA training.

waifuc is DeepGHS's data pipeline framework. This tool generates a complete, properly-configured script that:

  1. Crawls images from the specified sources (Danbooru, Pixiv, Gelbooru, etc.)

  2. Converts to RGB and standardizes backgrounds

  3. Filters monochrome/sketch/3D images (NoMonochromeAction, ClassFilterAction)

  4. Filters duplicate/similar images (FilterSimilarAction)

  5. Detects and splits to single-person crops (FaceCountAction, PersonSplitAction)

  6. Filters out wrong characters using CCIP AI identity matching (CCIPAction)

  7. Tags all images with WD14 tagger (TaggingAction)

  8. Crops to target resolution for the specified model format (SD1.5/SDXL/Flux)

  9. Exports in the correct format for the target trainer

Crop sizes by model format:

  • SD1.5: 512×512 base, bucket range 256–768

  • SDXL: 1024×1024 base, bucket range 512–2048

  • Flux: 1024×1024 base, bucket range 512–2048

Args: params (GenerateWaifucScriptInput): - character_name (str): Character display name (used in comments/output path) - danbooru_tag (Optional[str]): Danbooru tag e.g. 'rem_(re:zero)' - pixiv_query (Optional[str]): Pixiv search string e.g. 'レム リゼロ' - sources (list[ImageSource]): ['danbooru', 'pixiv', 'gelbooru', 'zerochan', 'sankaku', 'auto'] - model_format (ModelFormat): 'sd1.5', 'sdxl', or 'flux' - content_rating (ContentRating): 'safe', 'safe_r15', or 'all' - output_dir (str): Output directory path - max_images (Optional[int]): Max images to collect - pixiv_token (Optional[str]): Pixiv refresh token (required for Pixiv source)

Returns: str: Complete, ready-to-run Python script with inline comments explaining each pipeline action and its purpose for LoRA training quality.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
paramsYes

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/citronlegacy/deepghs-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server