Skip to main content
Glama
Dissimilis

DirForge

find_potential_duplicates

Identifies potential duplicate files by grouping files of identical size and hashing sampled chunks, providing efficient detection without full file comparison.

Instructions

Find potential duplicate files by grouping files with identical sizes then hashing candidates. Uses XXH3-128 with 5 × 2 MiB probabilistic chunk sampling (first, last, and 3 pseudo-random interior chunks per file) — results are potential duplicates, not guaranteed exact matches.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
pathNoRelative path within the file server root. Empty string for root.
minSizeNoMinimum file size in bytes to consider (default: 1).
maxDepthNoMaximum depth to traverse (default: 10, max: 10).
maxResultsNoMaximum number of duplicate groups to return (default: 20, max: 100).
includeNoComma-separated glob patterns to include (e.g. '*.log, *.txt'). Only matching files are returned. Supports * and ? wildcards. Empty means all files.
excludeNoComma-separated glob patterns to exclude (e.g. 'node_modules, *.tmp'). Matching files and directories are skipped. Supports * and ? wildcards.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Describes the hashing algorithm (XXH3-128) and sampling method (5×2 MiB chunks), and clearly states results are potential duplicates, not exact. Good transparency, though it does not mention side effects or resource usage.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences: first states the high-level function, second provides technical detail. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, and description does not explain return format (e.g., structure of duplicate groups). Could also mention performance considerations. Adequate but not fully complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers all 6 parameters with descriptions; description adds no extra meaning beyond the schema. Baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it finds potential duplicate files using size grouping and hashing. Distinct from sibling tools which do not perform duplicate detection.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implies usage for finding potential duplicates but provides no explicit guidance on when to use or avoid this tool. Given many sibling tools, differentiation is implicit only.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Dissimilis/DirForge'

If you have feedback or need assistance with the MCP directory API, please join our Discord server