memory_find_duplicates
Identify and analyze duplicate memory entries in Memora by scanning cross-references and using semantic comparison to maintain clean memory storage.
Instructions
Find potential duplicate memory pairs with optional LLM-powered comparison.
Scans cross-references to find memory pairs with similarity >= threshold, then optionally uses LLM to semantically compare them. Uses the same threshold (0.85) as the graph UI duplicate detection.
Args: min_similarity: Minimum similarity score to consider (default: 0.85) max_similarity: Maximum similarity score (default: 1.0, kept for backward compatibility) limit: Maximum pairs to analyze (default: 10) use_llm: Whether to use LLM for semantic comparison (default: True)
Returns: Dictionary with: - pairs: List of potential duplicate pairs with analysis - total_candidates: Total pairs found - analyzed: Number of pairs analyzed with LLM - llm_available: Whether LLM comparison was available
Rate limited: 120s cooldown.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| min_similarity | No | ||
| max_similarity | No | ||
| limit | No | ||
| use_llm | No |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |
Implementation Reference
- memora/server.py:1206-1264 (handler)The implementation of the logic for finding duplicate memory pairs, including optional LLM comparison.
async def _find_duplicates_impl( min_similarity: float, max_similarity: float, limit: int, use_llm: bool ) -> Dict[str, Any]: from .storage import compare_memories_llm, connect, find_duplicate_candidates with connect() as conn: candidates = find_duplicate_candidates(conn, min_similarity, limit * 2) total_candidates = len(candidates) pairs = [] llm_available = False for candidate in candidates[:limit]: mem_a = _get_memory(candidate["memory_a_id"]) mem_b = _get_memory(candidate["memory_b_id"]) if not mem_a or not mem_b: continue pair_result = { "memory_a": { "id": mem_a["id"], "preview": mem_a["content"][:150] + "..." if len(mem_a["content"]) > 150 else mem_a["content"], "tags": mem_a.get("tags", []), }, "memory_b": { "id": mem_b["id"], "preview": mem_b["content"][:150] + "..." if len(mem_b["content"]) > 150 else mem_b["content"], "tags": mem_b.get("tags", []), }, "similarity_score": round(candidate["similarity_score"], 3), } # Run LLM comparison if enabled if use_llm: llm_result = compare_memories_llm( mem_a["content"], mem_b["content"], mem_a.get("metadata"), mem_b.get("metadata"), ) if llm_result: llm_available = True pair_result["llm_verdict"] = llm_result.get("verdict", "review") pair_result["llm_confidence"] = llm_result.get("confidence", 0) pair_result["llm_reasoning"] = llm_result.get("reasoning", "") pair_result["suggested_action"] = llm_result.get("suggested_action", "review") if llm_result.get("merge_suggestion"): pair_result["merge_suggestion"] = llm_result["merge_suggestion"] pairs.append(pair_result) return { "pairs": pairs, "total_candidates": total_candidates, "analyzed": len(pairs), "llm_available": llm_available, } - memora/server.py:1171-1203 (handler)The tool registration for 'memory_find_duplicates' which uses '_find_duplicates_impl' for its execution logic.
async def memory_find_duplicates( min_similarity: float = 0.85, max_similarity: float = 1.0, limit: int = 10, use_llm: bool = True, ) -> Dict[str, Any]: """Find potential duplicate memory pairs with optional LLM-powered comparison. Scans cross-references to find memory pairs with similarity >= threshold, then optionally uses LLM to semantically compare them. Uses the same threshold (0.85) as the graph UI duplicate detection. Args: min_similarity: Minimum similarity score to consider (default: 0.85) max_similarity: Maximum similarity score (default: 1.0, kept for backward compatibility) limit: Maximum pairs to analyze (default: 10) use_llm: Whether to use LLM for semantic comparison (default: True) Returns: Dictionary with: - pairs: List of potential duplicate pairs with analysis - total_candidates: Total pairs found - analyzed: Number of pairs analyzed with LLM - llm_available: Whether LLM comparison was available Rate limited: 120s cooldown. """ if msg := _check_tool_cooldown("memory_find_duplicates"): return {"error": "rate_limited", "message": msg} try: return await _find_duplicates_impl(min_similarity, max_similarity, limit, use_llm) finally: _finish_tool("memory_find_duplicates")