Skip to main content
Glama
mshegolev

harbor-registry-mcp

harbor_cleanup_candidates

Read-onlyIdempotent

Identifies Harbor artifacts eligible for deletion—untagged, never pulled, or old versions—to help reclaim storage space.

Instructions

Suggest which artifacts could be deleted to reclaim space.

READ-ONLY — never deletes anything; just produces a list with reasons. Use harbor_delete_artifact / harbor_delete_untagged / harbor_delete_old_artifacts to act on the results.

Reasons emitted: - untagged — artifact has no tags (orphaned layer) - never_pulled — artifact has never been pulled (and is past the keep_latest_per_repo cutoff) - old_version — artifact is older than the keep_latest_per_repo newest tagged artifacts

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
project_nameYesHarbor project name.
include_untaggedNoSuggest deleting untagged artifacts (orphaned layers).
include_zero_pullsNoSuggest deleting artifacts that have never been pulled.
keep_latest_per_repoNoHow many newest artifacts to always keep per repository.

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
projectYes
candidates_countYes
total_reclaimableYes
total_reclaimable_bytesYes
candidatesYes
hintYes

Implementation Reference

  • Tool registration via @mcp.tool decorator with name='harbor_cleanup_candidates', annotations (title, readOnlyHint, destructiveHint, idempotentHint, openWorldHint), and structured_output=True
    @mcp.tool(
        name="harbor_cleanup_candidates",
        annotations={
            "title": "Cleanup Candidates",
            "readOnlyHint": True,
            "destructiveHint": False,
            "idempotentHint": True,
            "openWorldHint": True,
        },
        structured_output=True,
    )
    async def harbor_cleanup_candidates(
        project_name: Annotated[str, Field(min_length=1, max_length=255, description="Harbor project name.")],
        ctx: Context,
        include_untagged: Annotated[
            bool, Field(default=True, description="Suggest deleting untagged artifacts (orphaned layers).")
        ] = True,
        include_zero_pulls: Annotated[
            bool, Field(default=True, description="Suggest deleting artifacts that have never been pulled.")
        ] = True,
        keep_latest_per_repo: Annotated[
            int, Field(default=1, ge=0, le=100, description="How many newest artifacts to always keep per repository.")
        ] = 1,
    ) -> CleanupCandidatesOutput:
        """Suggest which artifacts could be deleted to reclaim space.
    
        **READ-ONLY** — never deletes anything; just produces a list with
        reasons. Use ``harbor_delete_artifact`` / ``harbor_delete_untagged`` /
        ``harbor_delete_old_artifacts`` to act on the results.
    
        Reasons emitted:
            - ``untagged``      — artifact has no tags (orphaned layer)
            - ``never_pulled``  — artifact has never been pulled (and is past
              the ``keep_latest_per_repo`` cutoff)
            - ``old_version``   — artifact is older than the ``keep_latest_per_repo``
              newest tagged artifacts
        """
        try:
            client = get_client()
            await _report(ctx, 0.05, f"listing repositories in {project_name}")
            repos = await asyncio.to_thread(_list_repos, client, project_name)
            candidates: list[CleanupCandidate] = []
            total_reclaimable = 0
            for i, repo in enumerate(repos):
                short_name = repo["name"].replace(f"{project_name}/", "")
                await _report(ctx, 0.1 + 0.85 * (i / max(len(repos), 1)), f"scanning {short_name}")
                artifacts_raw = await asyncio.to_thread(_list_artifacts, client, project_name, short_name)
                sorted_arts = sorted(artifacts_raw, key=lambda a: a.get("push_time") or "", reverse=True)
    
                for idx, art in enumerate(sorted_arts):
                    shaped = _shape_artifact(art)
                    reasons: list[str] = []
                    is_untagged = not shaped["tags"]
                    no_pulls = shaped["pull_time"] is None
    
                    if include_untagged and is_untagged:
                        reasons.append("untagged")
                    if idx >= keep_latest_per_repo and include_zero_pulls and no_pulls:
                        reasons.append("never_pulled")
                    if idx >= keep_latest_per_repo and not is_untagged and len(sorted_arts) > keep_latest_per_repo:
                        reasons.append("old_version")
    
                    if reasons:
                        total_reclaimable += shaped["size_bytes"]
                        candidates.append(
                            {
                                "repository": short_name,
                                "tags": shaped["tags"] or [],
                                "digest": shaped["digest"],
                                "size": shaped["size"],
                                "size_bytes": shaped["size_bytes"],
                                "push_time": shaped["push_time"],
                                "reasons": reasons,
                            }
                        )
    
            candidates.sort(key=lambda c: c["size_bytes"], reverse=True)
            await _report(ctx, 1.0, f"{len(candidates)} candidates found")
    
            result: CleanupCandidatesOutput = {
                "project": project_name,
                "candidates_count": len(candidates),
                "total_reclaimable": size_human(total_reclaimable),
                "total_reclaimable_bytes": total_reclaimable,
                "candidates": candidates,
                "hint": (
                    "Use harbor_delete_artifact for individual artifacts, "
                    "harbor_delete_untagged for bulk untagged, or "
                    "harbor_delete_old_artifacts (dry_run=True) for keep-N policies."
                ),
            }
            header = (
                f"## Cleanup candidates in {project_name}\n\n"
                f"Reclaimable: **{size_human(total_reclaimable)}** "
                f"across {len(candidates)} artifacts\n\n"
            )
            md = header + "\n".join(
                [
                    f"- **{c['repository']}** — "
                    f"{','.join(c['tags']) or '(untagged)'} — "
                    f"{c['size']} ({', '.join(c['reasons'])})"
                    for c in candidates[:30]
                ]
            )
            return output.ok(result, md)  # type: ignore[return-value]
        except Exception as exc:
            output.fail(exc, f"finding cleanup candidates in {project_name}")
  • Main handler function 'harbor_cleanup_candidates' — lists all repositories in a project, fetches all artifacts per repo, evaluates each artifact against include_untagged/include_zero_pulls/keep_latest_per_repo rules, collects candidates with reasons ('untagged', 'never_pulled', 'old_version'), sorts by size descending, and returns CleanupCandidatesOutput with markdown summary.
    async def harbor_cleanup_candidates(
        project_name: Annotated[str, Field(min_length=1, max_length=255, description="Harbor project name.")],
        ctx: Context,
        include_untagged: Annotated[
            bool, Field(default=True, description="Suggest deleting untagged artifacts (orphaned layers).")
        ] = True,
        include_zero_pulls: Annotated[
            bool, Field(default=True, description="Suggest deleting artifacts that have never been pulled.")
        ] = True,
        keep_latest_per_repo: Annotated[
            int, Field(default=1, ge=0, le=100, description="How many newest artifacts to always keep per repository.")
        ] = 1,
    ) -> CleanupCandidatesOutput:
        """Suggest which artifacts could be deleted to reclaim space.
    
        **READ-ONLY** — never deletes anything; just produces a list with
        reasons. Use ``harbor_delete_artifact`` / ``harbor_delete_untagged`` /
        ``harbor_delete_old_artifacts`` to act on the results.
    
        Reasons emitted:
            - ``untagged``      — artifact has no tags (orphaned layer)
            - ``never_pulled``  — artifact has never been pulled (and is past
              the ``keep_latest_per_repo`` cutoff)
            - ``old_version``   — artifact is older than the ``keep_latest_per_repo``
              newest tagged artifacts
        """
        try:
            client = get_client()
            await _report(ctx, 0.05, f"listing repositories in {project_name}")
            repos = await asyncio.to_thread(_list_repos, client, project_name)
            candidates: list[CleanupCandidate] = []
            total_reclaimable = 0
            for i, repo in enumerate(repos):
                short_name = repo["name"].replace(f"{project_name}/", "")
                await _report(ctx, 0.1 + 0.85 * (i / max(len(repos), 1)), f"scanning {short_name}")
                artifacts_raw = await asyncio.to_thread(_list_artifacts, client, project_name, short_name)
                sorted_arts = sorted(artifacts_raw, key=lambda a: a.get("push_time") or "", reverse=True)
    
                for idx, art in enumerate(sorted_arts):
                    shaped = _shape_artifact(art)
                    reasons: list[str] = []
                    is_untagged = not shaped["tags"]
                    no_pulls = shaped["pull_time"] is None
    
                    if include_untagged and is_untagged:
                        reasons.append("untagged")
                    if idx >= keep_latest_per_repo and include_zero_pulls and no_pulls:
                        reasons.append("never_pulled")
                    if idx >= keep_latest_per_repo and not is_untagged and len(sorted_arts) > keep_latest_per_repo:
                        reasons.append("old_version")
    
                    if reasons:
                        total_reclaimable += shaped["size_bytes"]
                        candidates.append(
                            {
                                "repository": short_name,
                                "tags": shaped["tags"] or [],
                                "digest": shaped["digest"],
                                "size": shaped["size"],
                                "size_bytes": shaped["size_bytes"],
                                "push_time": shaped["push_time"],
                                "reasons": reasons,
                            }
                        )
    
            candidates.sort(key=lambda c: c["size_bytes"], reverse=True)
            await _report(ctx, 1.0, f"{len(candidates)} candidates found")
    
            result: CleanupCandidatesOutput = {
                "project": project_name,
                "candidates_count": len(candidates),
                "total_reclaimable": size_human(total_reclaimable),
                "total_reclaimable_bytes": total_reclaimable,
                "candidates": candidates,
                "hint": (
                    "Use harbor_delete_artifact for individual artifacts, "
                    "harbor_delete_untagged for bulk untagged, or "
                    "harbor_delete_old_artifacts (dry_run=True) for keep-N policies."
                ),
            }
            header = (
                f"## Cleanup candidates in {project_name}\n\n"
                f"Reclaimable: **{size_human(total_reclaimable)}** "
                f"across {len(candidates)} artifacts\n\n"
            )
            md = header + "\n".join(
                [
                    f"- **{c['repository']}** — "
                    f"{','.join(c['tags']) or '(untagged)'} — "
                    f"{c['size']} ({', '.join(c['reasons'])})"
                    for c in candidates[:30]
                ]
            )
            return output.ok(result, md)  # type: ignore[return-value]
        except Exception as exc:
            output.fail(exc, f"finding cleanup candidates in {project_name}")
  • TypedDict schemas CleanupCandidate (repository, tags, digest, size, size_bytes, push_time, reasons) and CleanupCandidatesOutput (project, candidates_count, total_reclaimable, total_reclaimable_bytes, candidates, hint)
    class CleanupCandidate(TypedDict):
        repository: str
        tags: list[str]
        digest: str
        size: str
        size_bytes: int
        push_time: str | None
        reasons: list[str]
    
    
    class CleanupCandidatesOutput(TypedDict):
        project: str
        candidates_count: int
        total_reclaimable: str
        total_reclaimable_bytes: int
        candidates: list[CleanupCandidate]
        hint: str
  • Helper _shape_artifact — converts Harbor's raw artifact JSON into ArtifactSummary, used to extract tags, digest, size, timestamps for candidate evaluation.
    def _shape_artifact(a: dict[str, Any]) -> ArtifactSummary:
        """Convert Harbor's artifact JSON into :class:`ArtifactSummary`.
    
        Extracts tag list, scan status / vulnerabilities from the first
        available scanner in ``scan_overview``.
        """
        tags = [t["name"] for t in (a.get("tags") or []) if t.get("name")]
        scan_status: str | None = None
        vulnerabilities: dict[str, int] | None = None
        overview = a.get("scan_overview") or {}
        if overview:
            first = next(iter(overview.values()), {})
            scan_status = first.get("scan_status")
            summary = (first.get("summary") or {}).get("summary") or {}
            vulnerabilities = summary or None
    
        size_bytes = int(a.get("size", 0) or 0)
        return {
            "tags": tags or [],
            "digest": a.get("digest", ""),
            "size": size_human(size_bytes),
            "size_bytes": size_bytes,
            "push_time": _normalize_ts(a.get("push_time")),
            "pull_time": _normalize_ts(a.get("pull_time")),
            "scan_status": scan_status,
            "vulnerabilities": vulnerabilities,
        }
  • Helper functions _list_artifacts and _list_repos — paginate through all artifacts/repositories via the Harbor API client's get_all_pages method.
    def _list_artifacts(client: Any, project_name: str, repository_name: str) -> list[dict[str, Any]]:
        """Fetch every artifact for a repository across all pages."""
        return client.get_all_pages(
            f"/projects/{project_name}/repositories/{encode_repo(repository_name)}/artifacts",
            page_size=100,
            extra_params={"with_tag": True, "with_scan_overview": True},
        )
    
    
    def _list_repos(client: Any, project_name: str) -> list[dict[str, Any]]:
        """Fetch every repository for a project across all pages."""
        return client.get_all_pages(
            f"/projects/{project_name}/repositories",
            page_size=100,
        )
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, destructiveHint, and idempotentHint. The description adds behavioral context by listing the specific reasons emitted (untagged, never_pulled, old_version) and confirming no deletion occurs. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise with minimal sentences, yet fully informative. It uses bullet points for reasons, front-loading the purpose. Every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has an output schema, the description complements it by explaining the reasons that appear in the output. Input parameters are fully covered in schema. References to sibling tools for actions complete the context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the baseline is 3. The description does not add additional meaning beyond the schema for parameters; it focuses on output reasons. This is adequate but not enhanced.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool suggests artifacts for deletion to reclaim space, using specific verbs and resources. It distinguishes itself from deletion tools by emphasizing read-only nature and listing reasons emitted, which differentiates it from siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states it is read-only and never deletes, and directs users to sibling deletion tools (harbor_delete_artifact, etc.) for acting on results. This provides clear when-to-use and when-not-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mshegolev/harbor-registry-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server