Skip to main content
Glama

design.regression_test

Read-onlyIdempotent

Perform pixel-level regression testing by comparing a baseline snapshot with a current web page. Get a pass/fail result, diff image, and change percentage using a configurable threshold.

Instructions

ベースラインスナップショットと現在のWebページをPixelmatchでピクセルレベル比較し、閾値ベースのpass/fail判定を行います。diff画像(Base64 PNG)と変更ピクセル割合を返却します。design.track_changesのsnapshotアクションで保存したスナップショットをベースラインとして使用します。 / Pixel-level comparison between baseline snapshot and current web page via Pixelmatch. Returns threshold-based pass/fail, diff image (Base64 PNG), and change percentage. Use snapshots from design.track_changes snapshot action as baseline.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYes比較対象のWebページURL / Target web page URL
baseline_snapshot_idYesベースラインスナップショットID(design.track_changesで取得) / Baseline snapshot ID (from design.track_changes)
thresholdNopass/fail閾値(デフォルト0.001 = 0.1%) / Threshold (default 0.001 = 0.1%)
viewport_widthNoビューポート幅 / Viewport width
viewport_heightNoビューポート高さ / Viewport height

Implementation Reference

  • Main handler function for design.regression_test. Validates input via Zod schema, performs SSRF URL check, calls runVisualRegression service, and returns pass/fail with diff image.
    export async function designRegressionTestHandler(
      input: unknown
    ): Promise<DesignRegressionTestOutput> {
      const startTime = Date.now();
    
      // 入力バリデーション / Input validation
      let parsed: DesignRegressionTestInput;
      try {
        parsed = designRegressionTestInputSchema.parse(input);
      } catch (error) {
        const message =
          error instanceof z.ZodError
            ? error.errors.map((e) => `${e.path.join(".")}: ${e.message}`).join("; ")
            : "Invalid input";
        return {
          success: false,
          error: `${VISUAL_REGRESSION_ERROR_CODES.VALIDATION_ERROR}: ${message}`,
        };
      }
    
      // SSRF対策 / SSRF prevention
      const urlValidation = validateExternalUrl(parsed.url);
      if (!urlValidation.valid) {
        return {
          success: false,
          error: `${VISUAL_REGRESSION_ERROR_CODES.VALIDATION_ERROR}: URL blocked by security policy`,
        };
      }
    
      try {
        const result = await runVisualRegression({
          baselineSnapshotId: parsed.baseline_snapshot_id,
          url: parsed.url,
          threshold: parsed.threshold,
          viewportWidth: parsed.viewport_width,
          viewportHeight: parsed.viewport_height,
        });
    
        if (!result.success) {
          return { success: false, error: result.error ?? "Unknown error" };
        }
    
        const output: DesignRegressionTestOutput = { success: true };
        if (result.passed !== undefined) output.passed = result.passed;
        if (result.changePercentage !== undefined) output.change_percentage = result.changePercentage;
        if (result.changedPixels !== undefined) output.changed_pixels = result.changedPixels;
        if (result.totalPixels !== undefined) output.total_pixels = result.totalPixels;
        if (result.threshold !== undefined) output.threshold = result.threshold;
        if (result.diffImageBase64 !== undefined) output.diff_image_base64 = result.diffImageBase64;
        if (result.baseline) {
          output.baseline = {
            snapshot_id: result.baseline.snapshotId,
            snapshot_at: result.baseline.snapshotAt,
            web_page_url: result.baseline.webPageUrl,
          };
        }
        return output;
      } catch (error) {
        logger.warn("[design.regression_test] Handler failed", {
          error: sanitizeErrorMessage(error),
        });
        return {
          success: false,
          error: `${VISUAL_REGRESSION_ERROR_CODES.DIFF_FAILED}: ${sanitizeErrorMessage(error)}`,
        };
      } finally {
        logger.info("[design.regression_test] completed", {
          url: parsed.url,
          processingTimeMs: Date.now() - startTime,
        });
      }
    }
  • Zod input schema for design.regression_test: url (string, URL format), baseline_snapshot_id (UUID), threshold (0-1, default 0.001), viewport_width (320-4096, default 1920), viewport_height (240-16384, default 1080).
    export const designRegressionTestInputSchema = z.object({
      url: z
        .string()
        .url({ message: "有効なURL形式を指定してください / Valid URL format required" })
        .describe("比較対象のWebページURL / Target web page URL to compare against baseline"),
      baseline_snapshot_id: z
        .string()
        .regex(UUID_PATTERN, "Invalid UUID format")
        .describe(
          "ベースラインとして使用するスナップショットID(UUID形式) / " +
            "Baseline snapshot ID (UUID format, from design.track_changes snapshot action)"
        ),
      threshold: z
        .number()
        .min(0)
        .max(1)
        .optional()
        .default(0.001)
        .describe(
          "pass/fail判定の閾値(0-1、デフォルト0.001 = 0.1%)。変更ピクセル割合がこの値以下ならpass / " +
            "Threshold for pass/fail (0-1, default 0.001 = 0.1%). Pass if change percentage ≤ threshold"
        ),
      viewport_width: z
        .number()
        .int()
        .min(320)
        .max(4096)
        .optional()
        .default(1920)
        .describe(
          "スクリーンショットのビューポート幅(デフォルト1920) / Viewport width (default 1920)"
        ),
      viewport_height: z
        .number()
        .int()
        .min(240)
        .max(16384)
        .optional()
        .default(1080)
        .describe(
          "スクリーンショットのビューポート高さ(デフォルト1080) / Viewport height (default 1080)"
        ),
    });
  • Output type interface DesignRegressionTestOutput: success, passed, change_percentage, changed_pixels, total_pixels, threshold, diff_image_base64, baseline info, and error.
    export interface DesignRegressionTestOutput {
      success: boolean;
      /** テスト結果(pass/fail) / Test result (pass/fail) */
      passed?: boolean;
      /** 変更ピクセル割合(0-1) / Change percentage (0-1) */
      change_percentage?: number;
      /** 変更ピクセル数 / Changed pixel count */
      changed_pixels?: number;
      /** 全ピクセル数 / Total pixel count */
      total_pixels?: number;
      /** 使用閾値 / Threshold used */
      threshold?: number;
      /** diff画像(Base64 PNG) / Diff image (Base64 PNG) */
      diff_image_base64?: string;
      /** ベースライン情報 / Baseline info */
      baseline?: {
        snapshot_id: string;
        snapshot_at: string;
        web_page_url: string;
      };
      /** エラー情報 / Error info */
      error?: string;
    }
  • Tool definition object designRegressionTestToolDefinition with name 'design.regression_test', description, annotations, and inputSchema for MCP framework registration.
    export const designRegressionTestToolDefinition = {
      name: "design.regression_test",
      description:
        "ベースラインスナップショットと現在のWebページをPixelmatchでピクセルレベル比較し、" +
        "閾値ベースのpass/fail判定を行います。diff画像(Base64 PNG)と変更ピクセル割合を返却します。" +
        "design.track_changesのsnapshotアクションで保存したスナップショットをベースラインとして使用します。" +
        " / Pixel-level comparison between baseline snapshot and current web page via Pixelmatch. " +
        "Returns threshold-based pass/fail, diff image (Base64 PNG), and change percentage. " +
        "Use snapshots from design.track_changes snapshot action as baseline.",
      annotations: {
        title: "Visual Regression Test",
        readOnlyHint: true,
        idempotentHint: true,
        openWorldHint: true,
      },
      inputSchema: {
        type: "object" as const,
        properties: {
          url: {
            type: "string",
            format: "uri",
            description: "比較対象のWebページURL / Target web page URL",
          },
          baseline_snapshot_id: {
            type: "string",
            format: "uuid",
            description:
              "ベースラインスナップショットID(design.track_changesで取得) / " +
              "Baseline snapshot ID (from design.track_changes)",
          },
          threshold: {
            type: "number",
            minimum: 0,
            maximum: 1,
            default: 0.001,
            description: "pass/fail閾値(デフォルト0.001 = 0.1%) / Threshold (default 0.001 = 0.1%)",
          },
          viewport_width: {
            type: "integer",
            minimum: 320,
            maximum: 4096,
            default: 1920,
            description: "ビューポート幅 / Viewport width",
          },
          viewport_height: {
            type: "integer",
            minimum: 240,
            maximum: 16384,
            default: 1080,
            description: "ビューポート高さ / Viewport height",
          },
        },
        required: ["url", "baseline_snapshot_id"],
      },
    };
  • Registration of designRegressionTestHandler in the toolHandlers map at tools/index.ts line 850.
    // design.regression_test(ビジュアル回帰テスト、v0.4.0)
    "design.regression_test": designRegressionTestHandler,
    // page.batch_analyze(バッチ一括分析、v0.4.0)
  • Core service runVisualRegression: retrieves baseline snapshot from DB, captures current screenshot via Playwright, computes Pixelmatch diff, and returns pass/fail decision.
    export async function runVisualRegression(
      input: VisualRegressionInput
    ): Promise<VisualRegressionResult> {
      const threshold = input.threshold ?? DEFAULT_REGRESSION_THRESHOLD;
      const viewportWidth = input.viewportWidth ?? 1920;
      const viewportHeight = input.viewportHeight ?? 1080;
    
      // 1. ベースラインスナップショット取得
      const baselineData = await getBaselineScreenshot(input.baselineSnapshotId);
      if (!baselineData) {
        return {
          success: false,
          error: `${VISUAL_REGRESSION_ERROR_CODES.BASELINE_NOT_FOUND}: Baseline snapshot not found or has no screenshot`,
        };
      }
    
      // 2. 現在のスクリーンショットをキャプチャ
      let currentBuffer: Buffer;
      try {
        currentBuffer = await captureScreenshot(input.url, viewportWidth, viewportHeight);
      } catch (error) {
        logger.warn("[VisualRegression] Screenshot capture failed", {
          url: input.url,
          error: sanitizeErrorMessage(error),
        });
        return {
          success: false,
          error: `${VISUAL_REGRESSION_ERROR_CODES.CAPTURE_FAILED}: ${sanitizeErrorMessage(error)}`,
        };
      }
    
      // 3. Diff計算
      let diffResult;
      try {
        diffResult = await computeDiff(baselineData.buffer, currentBuffer);
      } catch (error) {
        logger.warn("[VisualRegression] Diff computation failed", {
          error: sanitizeErrorMessage(error),
        });
        return {
          success: false,
          error: `${VISUAL_REGRESSION_ERROR_CODES.DIFF_FAILED}: ${sanitizeErrorMessage(error)}`,
        };
      }
    
      // 4. Pass/Fail 判定
      const passed = diffResult.changePercentage <= threshold;
    
      return {
        success: true,
        passed,
        changePercentage: Math.round(diffResult.changePercentage * 10000) / 10000, // 4桁精度
        changedPixels: diffResult.changedPixels,
        totalPixels: diffResult.totalPixels,
        threshold,
        diffImageBase64: diffResult.diffImageBase64,
        baseline: {
          snapshotId: baselineData.snapshot.id,
          snapshotAt: baselineData.snapshot.snapshotAt.toISOString(),
          webPageUrl: baselineData.snapshot.webPage.url,
        },
      };
    }
  • Permission configuration: design.regression_test requires DESIGN_WRITE permission.
    "design.regression_test": [PERMISSIONS.DESIGN_WRITE],
  • Rate limiter configuration: design.regression_test categorized as 'analysis' tier.
    "design.regression_test": "analysis",
  • DI initialization of VisualRegression Prisma client factory during service registration.
    function initializeVisualRegressionService(
      config: ServiceInitializerConfig,
      result: SearchRegistrarResult
    ): void {
      try {
        // VisualRegressionService の DI 登録(Prisma designSnapshot model)
        // eslint-disable-next-line @typescript-eslint/no-explicit-any -- Prisma model access requires any cast from IPrismaClientMinimal
        const prismaWithDesignSnapshot = config.prisma as any;
        setVisualRegressionPrismaClientFactory(
          () =>
            ({
              designSnapshot: prismaWithDesignSnapshot.designSnapshot,
            }) as IVisualRegressionPrismaClient
        );
    
        result.registeredFactories.push("visualRegressionPrisma");
        result.categories.push("visualRegression");
        logger.info("[ServiceInitializer] visualRegression factory registered (prisma)");
      } catch (error) {
        recordInitError(result, "VisualRegression", ["visualRegressionPrisma"], error);
      }
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and idempotentHint, so the description does not need to restate those. It adds information about the comparison algorithm (Pixelmatch) and output format (diff image, change percentage). No contradictions, but no additional behavioral details beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, with two sentences that front-load purpose and output. Every sentence adds value: the first explains what the tool does and returns, the second specifies where the baseline comes from. No fluff or redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description adequately explains return values (diff image, change percentage, pass/fail). It also covers parameter usage, prerequisite (baseline snapshot source), and algorithm. The tool is fully described for an agent to invoke correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% for all 5 parameters. The description adds context that baseline_snapshot_id comes from design.track_changes, but overall does not significantly enhance meaning beyond the parameter descriptions in the schema. Baseline score 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool performs pixel-level comparison between a baseline snapshot and current web page using Pixelmatch, and returns threshold-based pass/fail, diff image, and change percentage. It distinguishes from siblings by specifying the use of snapshots from design.track_changes, making it unique among similar tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides a clear prerequisite: use snapshots from design.track_changes snapshot action as baseline. It implies the use case for regression testing. However, it does not explicitly state when not to use this tool or compare with alternatives like design.compare.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/TKMD/ReftrixMCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server