Skip to main content
Glama

find_duplicates

Identify and locate near-duplicate content within manuscript directories to maintain content uniqueness and avoid repetition.

Instructions

Find near-duplicate content

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
project_pathNoPath to manuscript directory (defaults to current directory)
scopeNoFile scope pattern
similarity_thresholdNoSimilarity threshold (0-1)
min_lengthNoMinimum content length
limitNoMaximum results

Implementation Reference

  • Tool registration and input schema for 'find_duplicates' in the MCP tool definitions array
    { name: "find_duplicates", description: "Find near-duplicate content", inputSchema: { type: "object", properties: { project_path: { type: "string", description: "Path to manuscript directory (defaults to current directory)" }, scope: { type: "string", description: "File scope pattern" }, similarity_threshold: { type: "number", description: "Similarity threshold (0-1)", default: 0.8, }, min_length: { type: "number", description: "Minimum content length", default: 50 }, limit: { type: "number", description: "Maximum results", default: 30 }, }, }, },
  • Handler function that processes tool arguments, applies pagination limits, and calls the underlying WritersAid service
    private async findDuplicates(args: Record<string, unknown>) { const scope = args.scope as string | undefined; const similarityThreshold = (args.similarity_threshold as number) || 0.8; const minLength = (args.min_length as number) || 50; const limit = resolvePaginationLimit("find_duplicates", args.limit as number | undefined); return this.writersAid.findDuplicates({ scope, similarityThreshold, minLength, limit }); }
  • Core duplicate finding logic that compares content paragraphs across files using Jaccard similarity
    async findDuplicates(options: { scope?: string; similarityThreshold?: number; minLength?: number; limit?: number; }): Promise<DuplicateMatch[]> { const { similarityThreshold = 0.8, minLength = 50, limit } = options; const files = await this.storage.getAllFiles(); const matches: DuplicateMatch[] = []; for (let i = 0; i < files.length; i++) { for (let j = i + 1; j < files.length; j++) { const duplicates = this.compareFiles( files[i], files[j], similarityThreshold, minLength ); matches.push(...duplicates); } } // Sort by similarity (highest first) before pagination const sorted = matches.sort((a, b) => b.similarity - a.similarity); return paginateResults(sorted, limit); }
  • Delegation method in WritersAid that forwards the call to DuplicateFinder instance
    async findDuplicates(options?: { scope?: string; similarityThreshold?: number; minLength?: number; limit?: number; }) { return this.duplicateFinder.findDuplicates(options || {}); }
  • Dispatch case in handleTool switch statement that routes to the specific handler
    case "find_duplicates": return this.findDuplicates(args);

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/xiaolai/claude-writers-aid-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server