Skip to main content
Glama
xiaolai
by xiaolai

find_duplicates

Identify and locate near-duplicate content within manuscript directories to maintain content uniqueness and avoid repetition.

Instructions

Find near-duplicate content

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
project_pathNoPath to manuscript directory (defaults to current directory)
scopeNoFile scope pattern
similarity_thresholdNoSimilarity threshold (0-1)
min_lengthNoMinimum content length
limitNoMaximum results

Implementation Reference

  • Tool registration and input schema for 'find_duplicates' in the MCP tool definitions array
    {
      name: "find_duplicates",
      description: "Find near-duplicate content",
      inputSchema: {
        type: "object",
        properties: {
          project_path: { type: "string", description: "Path to manuscript directory (defaults to current directory)" },
          scope: { type: "string", description: "File scope pattern" },
          similarity_threshold: {
            type: "number",
            description: "Similarity threshold (0-1)",
            default: 0.8,
          },
          min_length: { type: "number", description: "Minimum content length", default: 50 },
          limit: { type: "number", description: "Maximum results", default: 30 },
        },
      },
    },
  • Handler function that processes tool arguments, applies pagination limits, and calls the underlying WritersAid service
    private async findDuplicates(args: Record<string, unknown>) {
      const scope = args.scope as string | undefined;
      const similarityThreshold = (args.similarity_threshold as number) || 0.8;
      const minLength = (args.min_length as number) || 50;
      const limit = resolvePaginationLimit("find_duplicates", args.limit as number | undefined);
    
      return this.writersAid.findDuplicates({ scope, similarityThreshold, minLength, limit });
    }
  • Core duplicate finding logic that compares content paragraphs across files using Jaccard similarity
    async findDuplicates(options: {
      scope?: string;
      similarityThreshold?: number;
      minLength?: number;
      limit?: number;
    }): Promise<DuplicateMatch[]> {
      const { similarityThreshold = 0.8, minLength = 50, limit } = options;
    
      const files = await this.storage.getAllFiles();
      const matches: DuplicateMatch[] = [];
    
      for (let i = 0; i < files.length; i++) {
        for (let j = i + 1; j < files.length; j++) {
          const duplicates = this.compareFiles(
            files[i],
            files[j],
            similarityThreshold,
            minLength
          );
          matches.push(...duplicates);
        }
      }
    
      // Sort by similarity (highest first) before pagination
      const sorted = matches.sort((a, b) => b.similarity - a.similarity);
      return paginateResults(sorted, limit);
    }
  • Delegation method in WritersAid that forwards the call to DuplicateFinder instance
    async findDuplicates(options?: {
      scope?: string;
      similarityThreshold?: number;
      minLength?: number;
      limit?: number;
    }) {
      return this.duplicateFinder.findDuplicates(options || {});
    }
  • Dispatch case in handleTool switch statement that routes to the specific handler
    case "find_duplicates":
      return this.findDuplicates(args);

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/xiaolai/claude-writers-aid-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server