Skip to main content
Glama

deepghs_generate_cheesechaser_script

Read-onlyIdempotent

Generate a Python script to selectively download specific images from indexed HuggingFace datasets without downloading entire archives, using post IDs to extract only needed files.

Instructions

Generate a cheesechaser Python script to download images from an indexed DeepGHS dataset.

cheesechaser is DeepGHS's tool for selectively downloading images from HuggingFace datasets that are stored as indexed tar archives. Instead of downloading entire multi-GB tar files, you provide a list of post IDs and it extracts only those images.

This is the most efficient way to get specific images from datasets like:

  • deepghs/danbooru2024 (~8M images, hundreds of GB total)

  • deepghs/gelbooru-webp-4Mpixel (~millions of images)

  • deepghs/sankaku_full (~millions of images)

Args: params (GenerateCheesechaserScriptInput): - repo_id (str): HF dataset repo ID (e.g. 'deepghs/danbooru2024') - output_dir (str): Local directory to save downloaded images - post_ids (Optional[list[int]]): Specific post IDs to download - max_workers (int): Parallel download threads (1–16, default: 4)

Returns: str: Complete cheesechaser Python script with inline comments, plus guidance on how to find post IDs from Danbooru/Gelbooru search results.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
paramsYes

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=true, destructiveHint=false, idempotentHint=true, and openWorldHint=false, covering safety and idempotency. The description adds useful context beyond annotations: it explains that cheesechaser avoids downloading multi-GB tar files, extracts only specified images, and provides efficiency benefits for large datasets. It doesn't contradict annotations, as generating a script is a read-only, non-destructive action.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded: it starts with the core purpose, explains cheesechaser, provides usage context with examples, lists parameters clearly, and describes the return value. Every sentence adds value without redundancy, making it efficient and easy to scan.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (generating a download script for large datasets), the description is complete. It covers purpose, usage, parameters, and return value (a script with guidance). With annotations providing safety info and an output schema present (implied by 'Returns' section), no critical gaps remain. It effectively guides an agent in selecting and using this tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, but the description includes an 'Args' section that details all parameters (repo_id, output_dir, post_ids, max_workers) with examples and constraints (e.g., '1–16, default: 4'). This compensates for the lack of schema descriptions. However, it doesn't add significant meaning beyond what's implied by parameter names and basic info, so it meets the baseline for adequate coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Generate a cheesechaser Python script to download images from an indexed DeepGHS dataset.' It specifies the verb ('Generate'), resource ('cheesechaser Python script'), and distinguishes from siblings by focusing on script generation rather than dataset listing, searching, or other operations. The description also explains what cheesechaser does, adding valuable context.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool: 'This is the most efficient way to get specific images from datasets like...' It lists example datasets (e.g., deepghs/danbooru2024) and contrasts with downloading entire tar files. It also mentions alternatives implicitly by describing cheesechaser's selective download capability versus bulk downloads, though it doesn't name specific sibling tools as alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/citronlegacy/deepghs-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server