Skip to main content
Glama
carloshpdoc

memorydetective

Run an XCUITest with leak detection (CI-runnable)

detectLeaksInXCUITest

Detects new retain cycles in XCUITest by comparing memory graph snapshots before and after test execution, failing CI builds when unexpected leaks appear.

Instructions

[mg.ci] Build the workspace for testing, launch the test cycle, capture a baseline .memgraph once the app appears, run the test to completion, capture an after .memgraph, and diff. Returns passed: false when new ROOT CYCLE blocks appear that aren't in the allowlistPatterns list. Designed for CI gating — non-zero exit code on failure.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
workspaceYesPath to the .xcworkspace or .xcodeproj for the project.
schemeYesXcode scheme that builds and runs the XCUITest target.
testIdentifierYesXCUITest identifier in `<TestTarget>/<TestClass>/<testMethod>` form. Passed to `-only-testing` so we run exactly one test cycle.
appNameYesApp process name as it appears in `pgrep -x` (e.g. "DemoApp").
destinationNoxcodebuild destination string. Default targets the most common iOS Simulator profile.platform=iOS Simulator,name=iPhone 11,OS=latest
outputDirNoDirectory where the baseline + after `.memgraph` snapshots are written./tmp/memorydetective-xcuitest
allowlistPatternsNoSubstrings of class names that are allowed to leak. Examples: pre-existing SwiftUI internals you can't fix, third-party SDK leaks. Cycles whose root class contains any of these substrings won't fail the run.
skipBuildNoSkip the build-for-testing step (faster on CI when the build is already cached).

Implementation Reference

  • Main handler function `detectLeaksInXCUITest` that orchestrates the CI-runnable XCUITest leak detection flow: builds for testing, runs the test, captures baseline and after memgraphs, diffs them, and reports new ROOT CYCLEs not in the allowlist.
    export async function detectLeaksInXCUITest(
      input: DetectLeaksInXCUITestInput,
    ): Promise<XCUITestLeakResult> {
      const workspace = resolvePath(input.workspace);
      if (!existsSync(workspace)) {
        throw new Error(`Workspace not found: ${workspace}`);
      }
      const outputDir = resolvePath(input.outputDir);
      if (!existsSync(outputDir)) mkdirSync(outputDir, { recursive: true });
    
      const baselinePath = joinPath(outputDir, `${basename(input.workspace)}-baseline.memgraph`);
      const afterPath = joinPath(outputDir, `${basename(input.workspace)}-after.memgraph`);
      const steps: string[] = [];
    
      const isWorkspace = workspace.endsWith(".xcworkspace");
      const projectFlag = isWorkspace ? "-workspace" : "-project";
    
      // 1. Build for testing (once).
      if (!input.skipBuild) {
        await runXcodebuild(
          [
            projectFlag,
            workspace,
            "-scheme",
            input.scheme,
            "-destination",
            input.destination,
            "build-for-testing",
            "-quiet",
          ],
          "build-for-testing",
          steps,
        );
      } else {
        steps.push("(skipped build-for-testing)");
      }
    
      // 2. First run to bring the app up (we only need it running for the baseline capture).
      // We trigger the test once, but we capture BEFORE it really runs by polling pgrep.
      // The cleanest pattern is: launch the test in background, poll pgrep, capture once
      // the app process exists, then let the test continue.
      // Simpler implementation: run the test fully, capture AT END (after-state). Then run
      // a separate baseline run that captures during a no-op pre-flight test.
      // To keep this tool tractable, we do the simpler version: ONE test run, baseline
      // captured via a configurable preflight test name. The user wires that up.
      //
      // For v0.2, we treat this as: run the full XCUITest, capture the memgraph at
      // the END of the test (XCUITest holds the app open at the end of the test
      // method until the harness tears down). User must be aware that "baseline"
      // here is best-effort.
    
      // For now, run the test once and capture twice: once at the start (waiting for app
      // to launch via polling) and once after the test method returns.
      steps.push(`Running test: ${input.testIdentifier}`);
    
      // Run the test in the background so we can capture during/after.
      const testArgs = [
        projectFlag,
        workspace,
        "-scheme",
        input.scheme,
        "-destination",
        input.destination,
        "-only-testing:" + input.testIdentifier,
        "test-without-building",
        "-quiet",
      ];
    
      const { spawn } = await import("node:child_process");
      const child = spawn("xcodebuild", testArgs);
      let testStdout = "";
      let testStderr = "";
      child.stdout.on("data", (c: Buffer) => (testStdout += c.toString("utf8")));
      child.stderr.on("data", (c: Buffer) => (testStderr += c.toString("utf8")));
      const testPromise = new Promise<number>((resolve) => {
        child.on("close", (code) => resolve(code ?? -1));
      });
    
      // Poll pgrep until the app appears, then capture baseline.
      const startedAt = Date.now();
      let captured = false;
      while (Date.now() - startedAt < 5 * 60_000) {
        try {
          const pgrep = await runCommand("pgrep", ["-x", input.appName], {
            timeoutMs: 5_000,
          });
          if (pgrep.code === 0 && pgrep.stdout.trim()) {
            await captureMemgraphForApp(input.appName, baselinePath);
            steps.push(`Captured baseline: ${baselinePath}`);
            captured = true;
            break;
          }
        } catch {
          // app not running yet; keep polling
        }
        await new Promise((r) => setTimeout(r, 1500));
      }
      if (!captured) {
        child.kill("SIGTERM");
        throw new Error(
          `Timed out waiting for the app process "${input.appName}" to appear under the simulator. Is the test target actually launching the app?`,
        );
      }
    
      const testExitCode = await testPromise;
      steps.push(`Test exited with code ${testExitCode}`);
    
      // After the test method finishes, the app process is usually still around for a
      // short window before the simulator tears it down. Try the after-capture immediately.
      let afterCaptured = false;
      try {
        await captureMemgraphForApp(input.appName, afterPath);
        steps.push(`Captured after: ${afterPath}`);
        afterCaptured = true;
      } catch (err) {
        steps.push(
          `Skipped after-capture — app process ended before we could attach. ${err instanceof Error ? err.message : String(err)}`,
        );
      }
    
      if (!afterCaptured) {
        return {
          ok: false,
          passed: false,
          baselineMemgraph: baselinePath,
          afterMemgraph: "",
          testIdentifier: input.testIdentifier,
          totals: {
            baselineLeaks: 0,
            afterLeaks: 0,
            leakDelta: 0,
          },
          newCycles: [],
          failureReason:
            "After-capture failed. Configure the XCUITest to keep the app alive at end-of-test (e.g. `XCTAssertTrue(true); _ = XCTWaiter.wait(for: [...], timeout: 1.0)`) or run with a longer simulator boot.",
          steps,
        };
      }
    
      // 3. Diff.
      const [baseline, after] = await Promise.all([
        runLeaksAndParse(baselinePath),
        runLeaksAndParse(afterPath),
      ]);
      const baselineReport: LeaksReport = baseline.report;
      const afterReport: LeaksReport = after.report;
    
      const baselineRootClasses = new Set(
        rootCyclesOnly(baselineReport.cycles).map((c) => c.className || c.address),
      );
      const afterRoots = rootCyclesOnly(afterReport.cycles);
      const newOnes = afterRoots.filter(
        (c) => !baselineRootClasses.has(c.className || c.address),
      );
    
      const allowlistedFlags = newOnes.map((c) =>
        isAllowlisted(c.className, input.allowlistPatterns ?? []),
      );
    
      const failingCycles = newOnes
        .filter((_, i) => !allowlistedFlags[i])
        .map((c) => ({
          rootClass: c.className || c.address,
          chainLength: countDescendants(c.children),
          allowlisted: false,
        }));
    
      const newCycles = newOnes.map((c, i) => ({
        rootClass: c.className || c.address,
        chainLength: countDescendants(c.children) + 1,
        allowlisted: allowlistedFlags[i],
      }));
    
      const passed = failingCycles.length === 0 && testExitCode === 0;
    
      return {
        ok: true,
        passed,
        baselineMemgraph: baselinePath,
        afterMemgraph: afterPath,
        testIdentifier: input.testIdentifier,
        totals: {
          baselineLeaks: baselineReport.totals.leakCount,
          afterLeaks: afterReport.totals.leakCount,
          leakDelta:
            afterReport.totals.leakCount - baselineReport.totals.leakCount,
        },
        newCycles,
        failureReason: passed
          ? undefined
          : testExitCode !== 0
            ? `Test failed with exit code ${testExitCode}.`
            : `${failingCycles.length} new ROOT CYCLE(s) appeared after the test that aren't in the allowlist: ${failingCycles.map((c) => c.rootClass).slice(0, 5).join(", ")}${failingCycles.length > 5 ? ", ..." : ""}`,
        steps,
      };
    }
  • Zod schema `detectLeaksInXCUITestSchema` defining input parameters: workspace, scheme, testIdentifier, appName, destination, outputDir, allowlistPatterns, and skipBuild.
    export const detectLeaksInXCUITestSchema = z.object({
      workspace: z
        .string()
        .min(1)
        .describe("Path to the .xcworkspace or .xcodeproj for the project."),
      scheme: z
        .string()
        .min(1)
        .describe("Xcode scheme that builds and runs the XCUITest target."),
      testIdentifier: z
        .string()
        .min(1)
        .describe(
          "XCUITest identifier in `<TestTarget>/<TestClass>/<testMethod>` form. Passed to `-only-testing` so we run exactly one test cycle.",
        ),
      appName: z
        .string()
        .min(1)
        .describe("App process name as it appears in `pgrep -x` (e.g. \"DemoApp\")."),
      destination: z
        .string()
        .default("platform=iOS Simulator,name=iPhone 11,OS=latest")
        .describe(
          "xcodebuild destination string. Default targets the most common iOS Simulator profile.",
        ),
      outputDir: z
        .string()
        .default("/tmp/memorydetective-xcuitest")
        .describe(
          "Directory where the baseline + after `.memgraph` snapshots are written.",
        ),
      allowlistPatterns: z
        .array(z.string())
        .default([])
        .describe(
          "Substrings of class names that are allowed to leak. Examples: pre-existing SwiftUI internals you can't fix, third-party SDK leaks. Cycles whose root class contains any of these substrings won't fail the run.",
        ),
      skipBuild: z
        .boolean()
        .default(false)
        .describe(
          "Skip the build-for-testing step (faster on CI when the build is already cached).",
        ),
    });
  • src/index.ts:397-409 (registration)
    Tool registration via `server.registerTool('detectLeaksInXCUITest', ...)` with title, description, inputSchema, and handler that calls `detectLeaksInXCUITest(input)`.
    server.registerTool(
      "detectLeaksInXCUITest",
      {
        title: "Run an XCUITest with leak detection (CI-runnable)",
        description:
          "[mg.ci] Build the workspace for testing, launch the test cycle, capture a baseline `.memgraph` once the app appears, run the test to completion, capture an after `.memgraph`, and diff. Returns `passed: false` when new ROOT CYCLE blocks appear that aren't in the `allowlistPatterns` list. Designed for CI gating — non-zero exit code on failure.",
        inputSchema: detectLeaksInXCUITestSchema.shape,
      },
      async (input) => {
        const result = await detectLeaksInXCUITest(input);
        return { content: [{ type: "text", text: JSON.stringify(result, null, 2) }] };
      },
    );
  • Helper `captureMemgraphForApp` that resolves app PID via `resolveAppNameToPid` and runs `leaks --outputGraph` to capture a memgraph snapshot.
    async function captureMemgraphForApp(
      appName: string,
      outputPath: string,
    ): Promise<void> {
      const pid = await resolveAppNameToPid(appName);
      const result = await runCommand(
        "leaks",
        ["--outputGraph", outputPath, String(pid)],
        { timeoutMs: 120_000 },
      );
      if (result.code !== 0 && result.code !== 1) {
        throw new Error(
          `leaks --outputGraph failed (code ${result.code}): ${result.stderr || result.stdout}`,
        );
      }
      if (!existsSync(outputPath)) {
        throw new Error(`leaks reported success but output file is missing: ${outputPath}`);
      }
    }
  • Helper `runXcodebuild` that runs xcodebuild with given args, tracking steps and throwing on failure.
    async function runXcodebuild(
      args: string[],
      step: string,
      steps: string[],
    ): Promise<void> {
      steps.push(`$ xcodebuild ${args.join(" ")}`);
      const result = await runCommand("xcodebuild", args, { timeoutMs: 30 * 60_000 });
      if (result.code !== 0) {
        throw new Error(
          `${step} failed (code ${result.code}): ${result.stderr || result.stdout || "<no output>"}`,
        );
      }
    }
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Without annotations, the description carries the full burden and does well: it details the multi-step process, return value (`passed: false` when new root cycles appear outside allowlistPatterns), and exit code behavior. The only missing aspect is potential side effects like file modifications or permissions, but the description is sufficiently transparent for its purpose.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured paragraph of about 4 sentences. It front-loads the purpose, then explains the process, return value, and CI context. Every sentence adds value; no fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (multi-step test with leak detection) and no output schema, the description covers the essential flow, return value, and exit code. It could mention prerequisites like Xcode availability or that it requires a simulator, but it is complete enough for an agent to understand and invoke the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, and the input schema already provides detailed descriptions for all 8 parameters. The tool description does not add new semantics beyond what the schema offers (e.g., the allowlistPatterns explanation is already in the schema). Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool runs an XCUITest with leak detection for CI gating, listing the exact steps (build, capture baseline, run test, capture after, diff) and the return condition. It distinguishes itself from sibling tools like analyzeMemgraph or diffMemgraphs by focusing on automated test execution.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'Designed for CI gating — non-zero exit code on failure,' providing clear context for when to use it. However, it lacks explicit guidance on when not to use it or alternatives (e.g., using findCycles or analyzeMemgraph directly), which would improve the score to 5.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/carloshpdoc/memorydetective'

If you have feedback or need assistance with the MCP directory API, please join our Discord server