deduplicate_images

Remove visually similar images from collections using semantic analysis to identify and return the most diverse subset, reducing redundancy while preserving visual variety.

Instructions

Get top-k semantically unique images (URLs or base64-encoded) using Jina CLIP v2 embeddings and submodular optimization. Use this when you have many visually similar images and want the most diverse subset.

Input Schema

TableJSON Schema

Name	Required	Description	Default
`images`	Yes	Array of image inputs to deduplicate. Each item can be either an HTTP(S) URL or a raw base64-encoded image string (without data URI prefix).
`k`	No	Number of unique images to return. If not provided, automatically finds optimal k by looking at diminishing return

Implementation Reference

src/tools/jina-tools.ts:990-1105 (handler)
Handler function that embeds input images using Jina CLIP v2 API, selects top-k diverse images via submodular greedy optimization, downloads URLs to base64, and returns processed images.
async ({ images, k }: { images: string[]; k?: number }) => { try { const props = getProps(); const tokenError = checkBearerToken(props.bearerToken); if (tokenError) { return tokenError; } if (images.length === 0) { throw new Error("No images provided for deduplication"); } if (k !== undefined && (k <= 0 || k > images.length)) { throw new Error(`Invalid k value: ${k}. Must be between 1 and ${images.length}`); } // Prepare input for image embeddings API const embeddingInput = images.map((img) => ({ image: img })); // Get image embeddings from Jina API using CLIP v2 const response = await fetch('https://api.jina.ai/v1/embeddings', { method: 'POST', headers: { 'Accept': 'application/json', 'Content-Type': 'application/json', 'Authorization': `Bearer ${props.bearerToken}`, }, body: JSON.stringify({ model: 'jina-clip-v2', input: embeddingInput, }), }); if (!response.ok) { return handleApiError(response, "Getting image embeddings"); } const data = await response.json() as any; if (!data.data || !Array.isArray(data.data)) { throw new Error("Invalid response format from embeddings API"); } // Extract embeddings const embeddings = data.data.map((item: any) => item.embedding); // Use submodular optimization to select diverse images let selectedIndices: number[]; let values: number[]; if (k !== undefined) { selectedIndices = lazyGreedySelection(embeddings, k); values = []; } else { const result = lazyGreedySelectionWithSaturation(embeddings); selectedIndices = result.selected; values = result.values; } // Get the selected images const selectedImages = selectedIndices.map((idx) => ({ index: idx, source: images[idx] })); // Use our consolidated downloadImages utility for consistency const urlsToDownload = selectedImages .filter(({ source }) => /^https?:\/\//i.test(source)) .map(({ source }) => source); const base64Images = selectedImages .filter(({ source }) => !/^https?:\/\//i.test(source)) .map(({ source }) => source); const contentItems: Array<{ type: 'image'; data: string; mimeType: string } | { type: 'text'; text: string }> = []; // Download URLs using our utility if (urlsToDownload.length > 0) { const downloadResults = await downloadImages(urlsToDownload, 3, 15000); for (let i = 0; i < downloadResults.length; i++) { const result = downloadResults[i]; const selectedImage = selectedImages.find(({ source }) => source === urlsToDownload[i]); if (result.success && result.data) { contentItems.push({ type: 'image' as const, data: result.data, mimeType: result.mimeType, }); } else { contentItems.push({ type: 'text' as const, text: `Failed to download image at index ${selectedImage?.index || i}: ${result.error || 'Unknown error'}`, }); } } } // Add base64 images directly for (const base64Image of base64Images) { contentItems.push({ type: 'image' as const, data: base64Image, mimeType: 'image/jpeg', // Our utility converts to JPEG }); } if (contentItems.length === 0) { throw new Error("No images to return after deduplication"); } return { content: contentItems }; } catch (error) { return createErrorResponse(`Error: ${error instanceof Error ? error.message : String(error)}`); } },
src/tools/jina-tools.ts:986-989 (schema)
Zod input schema for the deduplicate_images tool defining the images array and optional k parameter.
{ images: z.array(z.string()).describe("Array of image inputs to deduplicate. Each item can be either an HTTP(S) URL or a raw base64-encoded image string (without data URI prefix)."), k: z.number().optional().describe("Number of unique images to return. If not provided, automatically finds optimal k by looking at diminishing return"), },
src/tools/jina-tools.ts:982-1107 (registration)
Registers the deduplicate_images tool on the MCP server using server.tool if enabled.
if (isToolEnabled("deduplicate_images")) { server.tool( "deduplicate_images", "Get top-k semantically unique images (URLs or base64-encoded) using Jina CLIP v2 embeddings and submodular optimization. Use this when you have many visually similar images and want the most diverse subset.", { images: z.array(z.string()).describe("Array of image inputs to deduplicate. Each item can be either an HTTP(S) URL or a raw base64-encoded image string (without data URI prefix)."), k: z.number().optional().describe("Number of unique images to return. If not provided, automatically finds optimal k by looking at diminishing return"), }, async ({ images, k }: { images: string[]; k?: number }) => { try { const props = getProps(); const tokenError = checkBearerToken(props.bearerToken); if (tokenError) { return tokenError; } if (images.length === 0) { throw new Error("No images provided for deduplication"); } if (k !== undefined && (k <= 0 || k > images.length)) { throw new Error(`Invalid k value: ${k}. Must be between 1 and ${images.length}`); } // Prepare input for image embeddings API const embeddingInput = images.map((img) => ({ image: img })); // Get image embeddings from Jina API using CLIP v2 const response = await fetch('https://api.jina.ai/v1/embeddings', { method: 'POST', headers: { 'Accept': 'application/json', 'Content-Type': 'application/json', 'Authorization': `Bearer ${props.bearerToken}`, }, body: JSON.stringify({ model: 'jina-clip-v2', input: embeddingInput, }), }); if (!response.ok) { return handleApiError(response, "Getting image embeddings"); } const data = await response.json() as any; if (!data.data || !Array.isArray(data.data)) { throw new Error("Invalid response format from embeddings API"); } // Extract embeddings const embeddings = data.data.map((item: any) => item.embedding); // Use submodular optimization to select diverse images let selectedIndices: number[]; let values: number[]; if (k !== undefined) { selectedIndices = lazyGreedySelection(embeddings, k); values = []; } else { const result = lazyGreedySelectionWithSaturation(embeddings); selectedIndices = result.selected; values = result.values; } // Get the selected images const selectedImages = selectedIndices.map((idx) => ({ index: idx, source: images[idx] })); // Use our consolidated downloadImages utility for consistency const urlsToDownload = selectedImages .filter(({ source }) => /^https?:\/\//i.test(source)) .map(({ source }) => source); const base64Images = selectedImages .filter(({ source }) => !/^https?:\/\//i.test(source)) .map(({ source }) => source); const contentItems: Array<{ type: 'image'; data: string; mimeType: string } | { type: 'text'; text: string }> = []; // Download URLs using our utility if (urlsToDownload.length > 0) { const downloadResults = await downloadImages(urlsToDownload, 3, 15000); for (let i = 0; i < downloadResults.length; i++) { const result = downloadResults[i]; const selectedImage = selectedImages.find(({ source }) => source === urlsToDownload[i]); if (result.success && result.data) { contentItems.push({ type: 'image' as const, data: result.data, mimeType: result.mimeType, }); } else { contentItems.push({ type: 'text' as const, text: `Failed to download image at index ${selectedImage?.index || i}: ${result.error || 'Unknown error'}`, }); } } } // Add base64 images directly for (const base64Image of base64Images) { contentItems.push({ type: 'image' as const, data: base64Image, mimeType: 'image/jpeg', // Our utility converts to JPEG }); } if (contentItems.length === 0) { throw new Error("No images to return after deduplication"); } return { content: contentItems }; } catch (error) { return createErrorResponse(`Error: ${error instanceof Error ? error.message : String(error)}`); } }, ); }
src/index.ts:24-24 (registration)
Lists 'deduplicate_images' in ALL_TOOLS array used for tool filtering and enabling.
"sort_by_relevance", "deduplicate_strings", "deduplicate_images", "extract_pdf"
src/index.ts:100-100 (registration)
Calls registerJinaTools which includes the deduplicate_images registration.
registerJinaTools(server, () => currentProps, enabledTools);

Jina AI Remote MCP Server

deduplicate_images

Instructions

Input Schema

Implementation Reference

Other Tools

Latest Blog Posts

MCP directory API