Skip to main content
Glama

visual_diff

Save a reference screenshot with 'snapshot' action, then compare new screenshots against it to detect visual regressions.

Instructions

Compare deux screenshots pour detecter des regressions visuelles. action='snapshot' pour sauver une reference, action='compare' pour comparer avec la reference.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
actionYes'snapshot' pour sauver, 'compare' pour comparer
nameNoNom du snapshot (defaut: 'default')default
thresholdNoSeuil de diff en % pour considerer PASS (defaut: 0.1)

Implementation Reference

  • Main handler function for the visual_diff tool. Takes action ('snapshot' or 'compare'), name, and threshold parameters. For 'snapshot': takes a screenshot and saves it as a reference. For 'compare': compares current screenshot against stored reference, computes pixel diff, and returns PASS/FAIL with diff stats.
      async ({ action, name, threshold }) => {
        const result = await resolveDevice();
        if ("error" in result) return { content: [{ type: "text", text: result.error }], isError: true };
        const dev = result.device;
    
        try {
          if (action === "snapshot") {
            const buffer = await takeScreenshot(dev.platform, dev.id);
            evictOldestSnapshot();
            referenceSnapshots.set(name, buffer);
    
            // Persist to disk
            await mkdir(SNAPSHOT_DIR, { recursive: true });
            await writeFile(`${SNAPSHOT_DIR}/${name}.png`, buffer);
    
            const png = decodePng(buffer);
            return {
              content: [{ type: "text", text: `Snapshot "${name}" sauve (${png.width}x${png.height}). Utilise action='compare' pour comparer plus tard.` }],
            };
          }
    
          // --- COMPARE ---
          // Get reference
          let refBuffer = referenceSnapshots.get(name);
          if (!refBuffer) {
            // Try loading from disk
            const diskPath = `${SNAPSHOT_DIR}/${name}.png`;
            const diskExists = await access(diskPath).then(() => true).catch(() => false);
            if (diskExists) {
              refBuffer = await readFile(diskPath);
              referenceSnapshots.set(name, refBuffer);
            } else {
              return { content: [{ type: "text", text: `Aucun snapshot "${name}". Utilise action='snapshot' d'abord.` }], isError: true };
            }
          }
    
          // Take current screenshot
          const currentBuffer = await takeScreenshot(dev.platform, dev.id);
    
          // Decode both
          const refPng = decodePng(refBuffer);
          const currentPng = decodePng(currentBuffer);
    
          // Size check
          if (refPng.width !== currentPng.width || refPng.height !== currentPng.height) {
            return {
              content: [{
                type: "text",
                text: `FAIL — Tailles differentes : reference ${refPng.width}x${refPng.height} vs actuel ${currentPng.width}x${currentPng.height}. Device ou orientation differente ?`,
              }],
              isError: true,
            };
          }
    
          // Compare
          const { diffCount, totalPixels, grid } = comparePixels(refPng, currentPng);
          const diffPct = (diffCount / totalPixels) * 100;
          const pass = diffPct <= threshold;
    
          const cellW = Math.ceil(refPng.width / 8);
          const cellH = Math.ceil(refPng.height / 8);
    
          const lines = [
            `Visual Diff : "${name}"`,
            `Reference : ${refPng.width}x${refPng.height}`,
            `Pixels differents : ${diffCount.toLocaleString()} / ${totalPixels.toLocaleString()} (${diffPct.toFixed(3)}%)`,
            `Seuil : ${threshold}%`,
            `Resultat : ${pass ? "PASS" : "FAIL"}`,
            "",
            formatGrid(grid, cellW, cellH),
          ];
    
          return {
            content: [{ type: "text", text: lines.join("\n") }],
            isError: !pass,
          };
        } catch (err) {
          const msg = err instanceof Error ? err.message : String(err);
          return { content: [{ type: "text", text: `Erreur visual_diff: ${msg}` }], isError: true };
        }
      }
    );
  • Registration function registerVisualDiff that registers the tool named 'visual_diff' on the MCP server with Zod schema for action, name, and threshold parameters.
    export function registerVisualDiff(server: McpServer): void {
      server.tool(
        "visual_diff",
        "Compare deux screenshots pour detecter des regressions visuelles. action='snapshot' pour sauver une reference, action='compare' pour comparer avec la reference.",
        {
          action: z.enum(["snapshot", "compare"]).describe("'snapshot' pour sauver, 'compare' pour comparer"),
          name: z.string().optional().default("default").describe("Nom du snapshot (defaut: 'default')"),
          threshold: z.number().min(0).max(100).optional().default(0.1).describe("Seuil de diff en % pour considerer PASS (defaut: 0.1)"),
        },
        async ({ action, name, threshold }) => {
          const result = await resolveDevice();
          if ("error" in result) return { content: [{ type: "text", text: result.error }], isError: true };
          const dev = result.device;
    
          try {
            if (action === "snapshot") {
              const buffer = await takeScreenshot(dev.platform, dev.id);
              evictOldestSnapshot();
              referenceSnapshots.set(name, buffer);
    
              // Persist to disk
              await mkdir(SNAPSHOT_DIR, { recursive: true });
              await writeFile(`${SNAPSHOT_DIR}/${name}.png`, buffer);
    
              const png = decodePng(buffer);
              return {
                content: [{ type: "text", text: `Snapshot "${name}" sauve (${png.width}x${png.height}). Utilise action='compare' pour comparer plus tard.` }],
              };
            }
    
            // --- COMPARE ---
            // Get reference
            let refBuffer = referenceSnapshots.get(name);
            if (!refBuffer) {
              // Try loading from disk
              const diskPath = `${SNAPSHOT_DIR}/${name}.png`;
              const diskExists = await access(diskPath).then(() => true).catch(() => false);
              if (diskExists) {
                refBuffer = await readFile(diskPath);
                referenceSnapshots.set(name, refBuffer);
              } else {
                return { content: [{ type: "text", text: `Aucun snapshot "${name}". Utilise action='snapshot' d'abord.` }], isError: true };
              }
            }
    
            // Take current screenshot
            const currentBuffer = await takeScreenshot(dev.platform, dev.id);
    
            // Decode both
            const refPng = decodePng(refBuffer);
            const currentPng = decodePng(currentBuffer);
    
            // Size check
            if (refPng.width !== currentPng.width || refPng.height !== currentPng.height) {
              return {
                content: [{
                  type: "text",
                  text: `FAIL — Tailles differentes : reference ${refPng.width}x${refPng.height} vs actuel ${currentPng.width}x${currentPng.height}. Device ou orientation differente ?`,
                }],
                isError: true,
              };
            }
    
            // Compare
            const { diffCount, totalPixels, grid } = comparePixels(refPng, currentPng);
            const diffPct = (diffCount / totalPixels) * 100;
            const pass = diffPct <= threshold;
    
            const cellW = Math.ceil(refPng.width / 8);
            const cellH = Math.ceil(refPng.height / 8);
    
            const lines = [
              `Visual Diff : "${name}"`,
              `Reference : ${refPng.width}x${refPng.height}`,
              `Pixels differents : ${diffCount.toLocaleString()} / ${totalPixels.toLocaleString()} (${diffPct.toFixed(3)}%)`,
              `Seuil : ${threshold}%`,
              `Resultat : ${pass ? "PASS" : "FAIL"}`,
              "",
              formatGrid(grid, cellW, cellH),
            ];
    
            return {
              content: [{ type: "text", text: lines.join("\n") }],
              isError: !pass,
            };
          } catch (err) {
            const msg = err instanceof Error ? err.message : String(err);
            return { content: [{ type: "text", text: `Erreur visual_diff: ${msg}` }], isError: true };
          }
        }
      );
    }
  • Zod schema defining the input parameters: action (enum: 'snapshot'/'compare'), name (optional string, default 'default'), threshold (optional number 0-100, default 0.1%).
    "Compare deux screenshots pour detecter des regressions visuelles. action='snapshot' pour sauver une reference, action='compare' pour comparer avec la reference.",
    {
      action: z.enum(["snapshot", "compare"]).describe("'snapshot' pour sauver, 'compare' pour comparer"),
      name: z.string().optional().default("default").describe("Nom du snapshot (defaut: 'default')"),
      threshold: z.number().min(0).max(100).optional().default(0.1).describe("Seuil de diff en % pour considerer PASS (defaut: 0.1)"),
    },
  • comparePixels function: pixel-by-pixel comparison of two PNG images with a color distance threshold of 30. Also builds an 8x8 grid to identify regions with differences.
    function comparePixels(ref: PNG, current: PNG): { diffCount: number; totalPixels: number; grid: number[][] } {
      const width = ref.width;
      const height = ref.height;
      const totalPixels = width * height;
      let diffCount = 0;
    
      // 8x8 grid for region detection
      const gridCols = 8;
      const gridRows = 8;
      const grid: number[][] = Array.from({ length: gridRows }, () => Array(gridCols).fill(0));
      const cellW = Math.ceil(width / gridCols);
      const cellH = Math.ceil(height / gridRows);
    
      for (let y = 0; y < height; y++) {
        for (let x = 0; x < width; x++) {
          const idx = (y * width + x) * 4;
          const dr = Math.abs(ref.data[idx] - current.data[idx]);
          const dg = Math.abs(ref.data[idx + 1] - current.data[idx + 1]);
          const db = Math.abs(ref.data[idx + 2] - current.data[idx + 2]);
    
          // Pixel is "different" if color distance > threshold
          if (dr + dg + db > 30) {
            diffCount++;
            const gx = Math.min(Math.floor(x / cellW), gridCols - 1);
            const gy = Math.min(Math.floor(y / cellH), gridRows - 1);
            grid[gy][gx]++;
          }
        }
      }
    
      return { diffCount, totalPixels, grid };
    }
  • takeScreenshot utility used by visual_diff to capture screenshots on iOS or Android platforms.
    export async function takeScreenshot(platform: "ios" | "android", deviceId: string): Promise<Buffer> {
      if (platform === "ios") return iosScreenshot(deviceId);
      return androidScreenshot();
    }
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description bears full responsibility for behavioral disclosure. It does not mention what the tool returns or any side effects (e.g., if it modifies state). This is a significant gap for a tool that mutates reference images.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concisely written in two sentences, front-loading the purpose and explaining the two actions without unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 3 parameters and no output schema, the description covers the two actions but lacks information on the output (e.g., pass/fail, diff image) and how it fits with sibling tools like 'screenshot' or 'assert_visible'. It is minimally adequate but leaves room for improvement.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the schema already documents all parameters. The description adds minimal value beyond the schema by giving brief context for 'action' values, but does not elaborate on 'name' or 'threshold' beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states that the tool compares two screenshots for visual regressions and distinguishes the two modes (snapshot and compare). This provides a specific verb and resource, and differentiates from sibling tools like 'screenshot'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear guidance on when to use each action ('snapshot' to save a reference, 'compare' to compare with the reference). However, it lacks explicit guidance on when not to use this tool or alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/nthImpulse/phantom-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server