Skip to main content
Glama

deepghs_generate_waifuc_script

Read-onlyIdempotent

Generate Python scripts to crawl, clean, and prepare anime character image datasets for LoRA training by filtering, tagging, and cropping images to model-specific resolutions.

Instructions

Generate a ready-to-run waifuc Python script to crawl and clean anime character images for LoRA training.

waifuc is DeepGHS's data pipeline framework. This tool generates a complete, properly-configured script that:

  1. Crawls images from the specified sources (Danbooru, Pixiv, Gelbooru, etc.)

  2. Converts to RGB and standardizes backgrounds

  3. Filters monochrome/sketch/3D images (NoMonochromeAction, ClassFilterAction)

  4. Filters duplicate/similar images (FilterSimilarAction)

  5. Detects and splits to single-person crops (FaceCountAction, PersonSplitAction)

  6. Filters out wrong characters using CCIP AI identity matching (CCIPAction)

  7. Tags all images with WD14 tagger (TaggingAction)

  8. Crops to target resolution for the specified model format (SD1.5/SDXL/Flux)

  9. Exports in the correct format for the target trainer

Crop sizes by model format:

  • SD1.5: 512×512 base, bucket range 256–768

  • SDXL: 1024×1024 base, bucket range 512–2048

  • Flux: 1024×1024 base, bucket range 512–2048

Args: params (GenerateWaifucScriptInput): - character_name (str): Character display name (used in comments/output path) - danbooru_tag (Optional[str]): Danbooru tag e.g. 'rem_(re:zero)' - pixiv_query (Optional[str]): Pixiv search string e.g. 'レム リゼロ' - sources (list[ImageSource]): ['danbooru', 'pixiv', 'gelbooru', 'zerochan', 'sankaku', 'auto'] - model_format (ModelFormat): 'sd1.5', 'sdxl', or 'flux' - content_rating (ContentRating): 'safe', 'safe_r15', or 'all' - output_dir (str): Output directory path - max_images (Optional[int]): Max images to collect - pixiv_token (Optional[str]): Pixiv refresh token (required for Pixiv source)

Returns: str: Complete, ready-to-run Python script with inline comments explaining each pipeline action and its purpose for LoRA training quality.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
paramsYes

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, destructiveHint=false, idempotentHint=true, and openWorldHint=false. The description adds valuable behavioral context beyond annotations by detailing the 9-step pipeline (crawling, filtering, cropping, etc.), crop size specifications by model format, and the script's purpose for LoRA training quality. This significantly enhances understanding without contradicting annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded with the core purpose, followed by detailed pipeline steps, crop size specifications, and parameter documentation. Every sentence adds value - no redundant information. It efficiently communicates complex functionality in a digestible format.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (9-step pipeline, multiple parameters, model format variations) and the existence of an output schema (returns a string script), the description provides comprehensive context. It explains the full pipeline, parameter meanings, crop specifications, and output format, making it complete enough for an agent to understand and use the tool effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description carries the full burden of parameter documentation. It provides detailed parameter semantics in the 'Args' section, explaining each parameter's purpose, format examples, and requirements (e.g., 'character_name' for display, 'pixiv_token' required for Pixiv source). This fully compensates for the schema's lack of descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Generate a ready-to-run waifuc Python script to crawl and clean anime character images for LoRA training.' It specifies the exact action (generate script), resource (waifuc Python script), and distinguishes from siblings by focusing on script generation rather than dataset finding, repo info, or other operations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool (for LoRA training data preparation) and implicitly distinguishes it from siblings like 'deepghs_find_character_dataset' (which finds existing datasets rather than generating collection scripts). However, it doesn't explicitly state when NOT to use this tool or name specific alternatives, keeping it at a 4.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/citronlegacy/deepghs-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server