find_duplicates
Identify and locate near-duplicate content within manuscript directories to maintain content uniqueness and avoid repetition.
Instructions
Find near-duplicate content
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| project_path | No | Path to manuscript directory (defaults to current directory) | |
| scope | No | File scope pattern | |
| similarity_threshold | No | Similarity threshold (0-1) | |
| min_length | No | Minimum content length | |
| limit | No | Maximum results |
Implementation Reference
- src/tools/WriterToolDefinitions.ts:255-272 (registration)Tool registration and input schema for 'find_duplicates' in the MCP tool definitions array{ name: "find_duplicates", description: "Find near-duplicate content", inputSchema: { type: "object", properties: { project_path: { type: "string", description: "Path to manuscript directory (defaults to current directory)" }, scope: { type: "string", description: "File scope pattern" }, similarity_threshold: { type: "number", description: "Similarity threshold (0-1)", default: 0.8, }, min_length: { type: "number", description: "Minimum content length", default: 50 }, limit: { type: "number", description: "Maximum results", default: 30 }, }, }, },
- src/tools/WriterToolHandlers.ts:277-284 (handler)Handler function that processes tool arguments, applies pagination limits, and calls the underlying WritersAid serviceprivate async findDuplicates(args: Record<string, unknown>) { const scope = args.scope as string | undefined; const similarityThreshold = (args.similarity_threshold as number) || 0.8; const minLength = (args.min_length as number) || 50; const limit = resolvePaginationLimit("find_duplicates", args.limit as number | undefined); return this.writersAid.findDuplicates({ scope, similarityThreshold, minLength, limit }); }
- Core duplicate finding logic that compares content paragraphs across files using Jaccard similarityasync findDuplicates(options: { scope?: string; similarityThreshold?: number; minLength?: number; limit?: number; }): Promise<DuplicateMatch[]> { const { similarityThreshold = 0.8, minLength = 50, limit } = options; const files = await this.storage.getAllFiles(); const matches: DuplicateMatch[] = []; for (let i = 0; i < files.length; i++) { for (let j = i + 1; j < files.length; j++) { const duplicates = this.compareFiles( files[i], files[j], similarityThreshold, minLength ); matches.push(...duplicates); } } // Sort by similarity (highest first) before pagination const sorted = matches.sort((a, b) => b.similarity - a.similarity); return paginateResults(sorted, limit); }
- src/WritersAid.ts:225-232 (helper)Delegation method in WritersAid that forwards the call to DuplicateFinder instanceasync findDuplicates(options?: { scope?: string; similarityThreshold?: number; minLength?: number; limit?: number; }) { return this.duplicateFinder.findDuplicates(options || {}); }
- Dispatch case in handleTool switch statement that routes to the specific handlercase "find_duplicates": return this.findDuplicates(args);