find_semantic_duplicates
Identify duplicate functions in your codebase using AST hash matching for exact copies or embedding cosine similarity for conceptual clones.
Instructions
Find duplicate functions. method='ast' (fast, hash-based, catches copy-paste) or 'embedding' (Nomic cosine, catches conceptual clones, tagged sim=min..mean per cluster).
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| min_lines | No | Skip functions shorter than this (default 2). Applies to method='ast'. | |
| max_groups | No | Max duplicate groups to return (default 10). Raise for full audit. | |
| method | No | ast (default, fast, exact) or embedding (slower, catches conceptual clones). Embedding reuses the symbol_vectors index from search_codebase(semantic=True) — first call triggers a ~2min reindex. | |
| min_similarity | No | Cosine threshold for method='embedding' (default 0.90). Lower = more recall + more noise. | |
| project | No | Project name/path (default: active). |