deduplicate_images
Identify and extract top-k visually unique images from a large set using Jina CLIP v2 embeddings and submodular optimization. Ideal for reducing redundancy in image collections.
Instructions
Get top-k semantically unique images (URLs or base64-encoded) using Jina CLIP v2 embeddings and submodular optimization. Use this when you have many visually similar images and want the most diverse subset.
Input Schema
Name | Required | Description | Default |
---|---|---|---|
images | Yes | Array of image inputs to deduplicate. Each item can be either an HTTP(S) URL or a raw base64-encoded image string (without data URI prefix). | |
k | No | Number of unique images to return. If not provided, automatically finds optimal k by looking at diminishing return |
Input Schema (JSON Schema)
{
"$schema": "http://json-schema.org/draft-07/schema#",
"additionalProperties": false,
"properties": {
"images": {
"description": "Array of image inputs to deduplicate. Each item can be either an HTTP(S) URL or a raw base64-encoded image string (without data URI prefix).",
"items": {
"type": "string"
},
"type": "array"
},
"k": {
"description": "Number of unique images to return. If not provided, automatically finds optimal k by looking at diminishing return",
"type": "number"
}
},
"required": [
"images"
],
"type": "object"
}