find_similar_sections
Detects near-duplicate and overlapping sections in documentation by fusing embedding similarity and lexical Jaccard scores, then clustering and ranking results by importance.
Instructions
Multi-signal section dedup detection. Fuses embedding cosine (when available) with title + body lexical Jaccard, clusters via union-find, ranks each cluster's canonical by backlink_count + size. Verdict tiers: near_duplicate, overlapping_topic, parallel_tutorial. Read-only.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| repo | Yes | ||
| min_score | No | Pairwise score floor for clustering. Default 0.7. | |
| near_duplicate_threshold | No | Score at/above which a cluster is flagged near_duplicate. | |
| max_clusters | No | ||
| exclude_same_doc | No | Skip pairs in the same doc. Useful for long pages with repeated structure. | |
| max_sections | No | Hard cap on sections examined. Default 1000. |