deduplicate_images
Remove duplicate images by identifying and retaining the most diverse subset using Jina CLIP v2 embeddings and submodular optimization. Ideal for managing large sets of visually similar images efficiently.
Instructions
Get top-k semantically unique images (URLs or base64-encoded) using Jina CLIP v2 embeddings and submodular optimization. Use this when you have many visually similar images and want the most diverse subset.
Input Schema
Name | Required | Description | Default |
---|---|---|---|
images | Yes | Array of image inputs to deduplicate. Each item can be either an HTTP(S) URL or a raw base64-encoded image string (without data URI prefix). | |
k | No | Number of unique images to return. If not provided, automatically finds optimal k by looking at diminishing return |