deepghs_generate_waifuc_script
Generate Python scripts to crawl, clean, and prepare anime character image datasets for LoRA training by filtering, tagging, and cropping images to model-specific resolutions.
Instructions
Generate a ready-to-run waifuc Python script to crawl and clean anime character images for LoRA training.
waifuc is DeepGHS's data pipeline framework. This tool generates a complete, properly-configured script that:
Crawls images from the specified sources (Danbooru, Pixiv, Gelbooru, etc.)
Converts to RGB and standardizes backgrounds
Filters monochrome/sketch/3D images (NoMonochromeAction, ClassFilterAction)
Filters duplicate/similar images (FilterSimilarAction)
Detects and splits to single-person crops (FaceCountAction, PersonSplitAction)
Filters out wrong characters using CCIP AI identity matching (CCIPAction)
Tags all images with WD14 tagger (TaggingAction)
Crops to target resolution for the specified model format (SD1.5/SDXL/Flux)
Exports in the correct format for the target trainer
Crop sizes by model format:
SD1.5: 512×512 base, bucket range 256–768
SDXL: 1024×1024 base, bucket range 512–2048
Flux: 1024×1024 base, bucket range 512–2048
Args: params (GenerateWaifucScriptInput): - character_name (str): Character display name (used in comments/output path) - danbooru_tag (Optional[str]): Danbooru tag e.g. 'rem_(re:zero)' - pixiv_query (Optional[str]): Pixiv search string e.g. 'レム リゼロ' - sources (list[ImageSource]): ['danbooru', 'pixiv', 'gelbooru', 'zerochan', 'sankaku', 'auto'] - model_format (ModelFormat): 'sd1.5', 'sdxl', or 'flux' - content_rating (ContentRating): 'safe', 'safe_r15', or 'all' - output_dir (str): Output directory path - max_images (Optional[int]): Max images to collect - pixiv_token (Optional[str]): Pixiv refresh token (required for Pixiv source)
Returns: str: Complete, ready-to-run Python script with inline comments explaining each pipeline action and its purpose for LoRA training quality.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| params | Yes |