deduplicate_images
Eliminate redundant images by identifying and retaining the most diverse subset using Jina CLIP v2 embeddings and submodular optimization. Ideal for processing large sets of visually similar images.
Instructions
Get top-k semantically unique images (URLs or base64-encoded) using Jina CLIP v2 embeddings and submodular optimization. Use this when you have many visually similar images and want the most diverse subset. Returns selected images as PNG base64-encoded images.
Input Schema
Name | Required | Description | Default |
---|---|---|---|
images | Yes | Array of image inputs to deduplicate. Each item can be either an HTTP(S) URL or a raw base64-encoded image string (without data URI prefix). | |
k | No | Number of unique images to return. If not provided, automatically finds optimal k by looking at diminishing return |