deepghs_generate_cheesechaser_script
Generate a Python script to selectively download specific images from indexed HuggingFace datasets without downloading entire archives, using post IDs to extract only needed files.
Instructions
Generate a cheesechaser Python script to download images from an indexed DeepGHS dataset.
cheesechaser is DeepGHS's tool for selectively downloading images from HuggingFace datasets that are stored as indexed tar archives. Instead of downloading entire multi-GB tar files, you provide a list of post IDs and it extracts only those images.
This is the most efficient way to get specific images from datasets like:
deepghs/danbooru2024 (~8M images, hundreds of GB total)
deepghs/gelbooru-webp-4Mpixel (~millions of images)
deepghs/sankaku_full (~millions of images)
Args: params (GenerateCheesechaserScriptInput): - repo_id (str): HF dataset repo ID (e.g. 'deepghs/danbooru2024') - output_dir (str): Local directory to save downloaded images - post_ids (Optional[list[int]]): Specific post IDs to download - max_workers (int): Parallel download threads (1–16, default: 4)
Returns: str: Complete cheesechaser Python script with inline comments, plus guidance on how to find post IDs from Danbooru/Gelbooru search results.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| params | Yes |