find_potential_duplicates
Identifies potential duplicate files by grouping files of identical size and hashing sampled chunks, providing efficient detection without full file comparison.
Instructions
Find potential duplicate files by grouping files with identical sizes then hashing candidates. Uses XXH3-128 with 5 × 2 MiB probabilistic chunk sampling (first, last, and 3 pseudo-random interior chunks per file) — results are potential duplicates, not guaranteed exact matches.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| path | No | Relative path within the file server root. Empty string for root. | |
| minSize | No | Minimum file size in bytes to consider (default: 1). | |
| maxDepth | No | Maximum depth to traverse (default: 10, max: 10). | |
| maxResults | No | Maximum number of duplicate groups to return (default: 20, max: 100). | |
| include | No | Comma-separated glob patterns to include (e.g. '*.log, *.txt'). Only matching files are returned. Supports * and ? wildcards. Empty means all files. | |
| exclude | No | Comma-separated glob patterns to exclude (e.g. 'node_modules, *.tmp'). Matching files and directories are skipped. Supports * and ? wildcards. |