Skip to main content
Glama
debugg-ai

Debugg AI MCP

Official
by debugg-ai

Trigger App Crawl

trigger_crawl

Trigger a browser-agent crawl of a web app to populate its knowledge graph. Use after significant features or onboarding to provide context for evaluations. Supports localhost URLs automatically.

Instructions

Trigger a browser-agent crawl of a web app to build the project's knowledge graph. The crawl systematically explores pages, UI states, and navigation flows, then populates the backend's knowledge graph so future evaluations and tests have context about the app.

LOCALHOST SUPPORT: Pass any localhost URL (e.g. http://localhost:3000) and it Just Works. A secure tunnel is automatically created so the remote browser can reach your local dev server.

WHEN TO USE: after a significant new feature, a new environment, or when onboarding a project. NOT for per-change verification — use check_app_in_browser for that.

SCOPE: one crawl per call against one URL. The crawl is long-running (minutes to tens of minutes depending on app size) and populates backend state asynchronously; the tool returns the execution status once the workflow completes. This does NOT return pass/fail — it returns executionId + status + outcome.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesURL to crawl. Can be any public URL or a localhost/local dev server URL. For localhost URLs, a secure tunnel is automatically created — just make sure your dev server is running on that port.
projectUuidNoUUID of the project whose knowledge graph the crawl should populate. Auto-detected from the current git repo if omitted.
environmentIdNoUUID of a specific environment to use for the crawl. See available environments in the tool description above.
credentialIdNoUUID of a specific credential for authenticated crawls. See available credentials in the tool description above.
credentialRoleNoPick a credential by role (e.g. 'admin', 'guest') from the resolved environment.
usernameNoA real, existing account email for the target app. Do NOT invent credentials — use one from the available credentials or ask the user.
passwordNoThe real password for the username above. Do NOT guess.
headlessNoRun the browser in headless mode. Defaults to backend configuration.
timeoutSecondsNoMaximum wall-time the crawl may run, in seconds (1..1800). Backend enforces this per workflow execution.
repoNameNoGitHub repository name (e.g. 'my-org/my-repo'). Auto-detected from the current git repo — only provide this to run against a different project.

Implementation Reference

  • Main handler function for trigger_crawl tool. Implements the 4-step pattern: find template, provision tunnel if localhost, execute workflow, poll for result. Handles localhost URLs with automatic tunnel provisioning, crawl workflow execution via DebuggAI API, polling with transient error retry, and response formatting with crawl metrics and knowledge graph import results.
    export async function triggerCrawlHandler(
      input: TriggerCrawlInput,
      context: ToolContext,
      rawProgressCallback?: ProgressCallback,
    ): Promise<ToolResponse> {
      const startTime = Date.now();
      logger.toolStart('trigger_crawl', input);
    
      // Bead 0bq: progress circuit-breaker — see testPageChangesHandler for rationale.
      let progressDisabled = false;
      const progressCallback: ProgressCallback | undefined = rawProgressCallback
        ? async (update) => {
            if (progressDisabled) return;
            try {
              await rawProgressCallback(update);
            } catch (err) {
              progressDisabled = true;
              logger.warn('Progress emission failed; disabling further emissions for this request', {
                error: err instanceof Error ? err.message : String(err),
              });
            }
          }
        : undefined;
    
      const client = new DebuggAIServerClient(config.api.key);
      await client.init();
    
      const originalUrl = resolveTargetUrl(input);
      let ctx = buildContext(originalUrl);
      let keyId: string | undefined;
    
      const abortController = new AbortController();
      const onStdinClose = () => {
        abortController.abort();
        progressDisabled = true;
      };
      process.stdin.once('close', onStdinClose);
    
      try {
        // --- Tunnel: reuse existing or provision a fresh one ---
        if (ctx.isLocalhost) {
          // Bead 1om: pre-flight local port probe BEFORE provision/ngrok/backend.
          const localPort = extractLocalhostPort(ctx.originalUrl);
          if (typeof localPort === 'number') {
            const probe = await probeLocalPort(localPort);
            if (!probe.reachable) {
              const payload = {
                error: 'LocalServerUnreachable',
                message: `No server listening on 127.0.0.1:${localPort}. Start your dev server on that port before running trigger_crawl. Probe result: ${probe.code} (${probe.detail ?? 'no detail'}).`,
                detail: { port: localPort, probeCode: probe.code, probeDetail: probe.detail, elapsedMs: probe.elapsedMs },
              };
              logger.warn(`Pre-flight port probe failed for ${ctx.originalUrl}: ${probe.code} in ${probe.elapsedMs}ms`);
              return { content: [{ type: 'text', text: JSON.stringify(payload, null, 2) }], isError: true };
            }
          }
    
          if (progressCallback) {
            await progressCallback({ progress: 1, total: 4, message: 'Provisioning secure tunnel for localhost...' });
          }
    
          const reused = findExistingTunnel(ctx);
          if (reused) {
            ctx = reused;
          } else {
            let tunnel;
            try {
              tunnel = await client.tunnels!.provisionWithRetry();
            } catch (provisionError) {
              const msg = provisionError instanceof Error ? provisionError.message : String(provisionError);
              const diag = provisionError instanceof TunnelProvisionError ? ` ${provisionError.diagnosticSuffix()}` : '';
              throw new Error(
                `Failed to provision tunnel for ${ctx.originalUrl}. ` +
                `The remote browser needs a secure tunnel to reach your local dev server. ` +
                `(Detail: ${msg})${diag}`,
              );
            }
            keyId = tunnel.keyId;
            ctx = await ensureTunnel(
              ctx,
              tunnel.tunnelKey,
              tunnel.tunnelId,
              tunnel.keyId,
              () => client.revokeNgrokKey(tunnel.keyId),
            );
          }
    
          // Bead 1om: post-tunnel health check — verify traffic actually flows.
          if (ctx.targetUrl) {
            const health = await probeTunnelHealth(ctx.targetUrl);
            if (!health.healthy) {
              const payload = {
                error: 'TunnelTrafficBlocked',
                message: `Tunnel was established but traffic isn't reaching the dev server. ${health.detail ?? ''} Common causes: dev server binds to 0.0.0.0 or ::1 but not 127.0.0.1; dev server crashed; firewall.`,
                detail: {
                  code: health.code,
                  status: health.status,
                  ngrokErrorCode: health.ngrokErrorCode,
                  elapsedMs: health.elapsedMs,
                },
              };
              logger.warn(`Tunnel health probe failed for ${ctx.targetUrl}: ${health.code} ${health.ngrokErrorCode ?? ''} in ${health.elapsedMs}ms`);
              if (ctx.tunnelId) {
                tunnelManager.stopTunnel(ctx.tunnelId).catch((err) =>
                  logger.warn(`Failed to stop broken tunnel ${ctx.tunnelId}: ${err}`),
                );
              }
              keyId = undefined;
              return { content: [{ type: 'text', text: JSON.stringify(payload, null, 2) }], isError: true };
            }
          }
        }
    
        // --- Find the crawl workflow template (cached across calls) ---
        if (progressCallback) {
          await progressCallback({ progress: 2, total: 4, message: 'Locating crawl workflow template...' });
        }
    
        const templateUuid = await getCachedTemplateUuid(TEMPLATE_KEYWORD, async (name) => {
          return client.workflows!.findTemplateByName(name);
        });
        if (!templateUuid) {
          throw new Error(
            `Raw Crawl Workflow Template not found. ` +
            `Ensure the backend has a template matching "${TEMPLATE_KEYWORD}" seeded and accessible.`,
          );
        }
    
        // --- Build contextData + env ---
        const contextData: Record<string, any> = {
          targetUrl: ctx.targetUrl ?? ctx.originalUrl,
        };
        if (input.projectUuid) contextData.projectId = input.projectUuid;
        if (typeof input.headless === 'boolean') contextData.headless = input.headless;
        if (typeof input.timeoutSeconds === 'number') contextData.timeoutSeconds = input.timeoutSeconds;
    
        const env: Record<string, any> = {};
        if (input.environmentId) env.environmentId = input.environmentId;
        if (input.credentialId) env.credentialId = input.credentialId;
        if (input.credentialRole) env.credentialRole = input.credentialRole;
        if (input.username) env.username = input.username;
        if (input.password) env.password = input.password;
    
        // --- Execute ---
        if (progressCallback) {
          await progressCallback({ progress: 3, total: 4, message: 'Queuing crawl execution...' });
        }
    
        // --- Execute + Poll (with bounded retry on transient errors, bead kbo9) ---
        const TERMINAL_STATUSES = new Set(['completed', 'failed', 'cancelled']);
        const MAX_RETRIES = getMaxTransientRetries();
    
        let executeResponse: import('../services/workflows.js').WorkflowExecuteResponse | undefined;
        let executionUuid = '';
        let finalExecution: import('../services/workflows.js').WorkflowExecution | undefined;
        let attempt = 0;
    
        while (true) {
          attempt++;
    
          if (attempt > 1) {
            Telemetry.capture(TelemetryEvents.WORKFLOW_TRANSIENT_RETRY, {
              tool: 'trigger_crawl',
              attempt,
              reason: transientReasonTag(finalExecution),
              previousExecutionId: executionUuid,
              previousErrorMessage: finalExecution?.errorMessage?.slice(0, 200),
              previousStateError: finalExecution?.state?.error?.slice(0, 200),
            });
            if (progressCallback) {
              await progressCallback({
                progress: 3, total: 4,
                message: `Transient backend error — retrying crawl (attempt ${attempt}/${MAX_RETRIES + 1})...`,
              });
            }
            await new Promise(r => setTimeout(r, 1000 * (attempt - 1)));
          }
    
          executeResponse = await client.workflows!.executeWorkflow(
            templateUuid,
            contextData,
            Object.keys(env).length > 0 ? env : undefined,
          );
          executionUuid = executeResponse.executionUuid;
          logger.info(`Crawl execution queued: ${executionUuid}${attempt > 1 ? ` (retry ${attempt - 1}/${MAX_RETRIES})` : ''}`);
    
          // --- Poll ---
          // Bead 0bq: emit the final progress (4/4 "Complete:...") INSIDE onUpdate
          // when terminal status detected, so there's no post-resolve emission that
          // could race the response and cause stale-progressToken transport tear-down.
          finalExecution = await client.workflows!.pollExecution(executionUuid, async (exec) => {
            if (ctx.tunnelId) touchTunnelById(ctx.tunnelId);
            if (!progressCallback) return;
            const nodeCount = (exec.nodeExecutions ?? []).length;
            if (TERMINAL_STATUSES.has(exec.status)) {
              await progressCallback({
                progress: 4, total: 4,
                message: `Crawl ${exec.status} (${nodeCount} nodes)`,
              });
              return;
            }
            await progressCallback({
              progress: 4, total: 4,
              message: `Crawl ${exec.status} (${nodeCount} nodes)`,
            });
          }, abortController.signal);
    
          if (attempt > MAX_RETRIES) break;
          if (!isTransientWorkflowError(finalExecution)) break;
          logger.warn(
            `Transient backend error detected on crawl (${transientReasonTag(finalExecution) ?? 'unknown'}) — ` +
            `retrying (attempt ${attempt + 1}/${MAX_RETRIES + 1})`,
          );
        }
    
        const duration = Date.now() - startTime;
        const nodes = finalExecution.nodeExecutions ?? [];
    
        // --- Format response ---
        const responsePayload: Record<string, any> = {
          executionId: executionUuid,
          status: finalExecution.status,
          targetUrl: ctx.originalUrl,
          durationMs: finalExecution.durationMs ?? duration,
        };
        const outcome = finalExecution.state?.outcome;
        if (outcome !== undefined && outcome !== null) responsePayload.outcome = outcome;
        if (finalExecution.errorMessage) responsePayload.errorMessage = finalExecution.errorMessage;
        if (finalExecution.errorInfo?.failedNodeId) responsePayload.failedNode = finalExecution.errorInfo.failedNodeId;
        if (executeResponse.resolvedEnvironmentId) responsePayload.resolvedEnvironmentId = executeResponse.resolvedEnvironmentId;
        if (executeResponse.resolvedCredentialId) responsePayload.resolvedCredentialId = executeResponse.resolvedCredentialId;
        // Backend release 2026-04-25: browser_session block on execution detail.
        // Crawl runs through the same browser pipeline, so the field is populated
        // here too. Pass through verbatim (presigned S3 URLs).
        if (finalExecution.browserSession) {
          responsePayload.browserSession = finalExecution.browserSession;
        }
    
        // Extract crawl metrics from surfer.crawl node (absent in older graph shapes)
        const crawlNode = nodes.find(n => n.nodeType === 'surfer.crawl');
        if (crawlNode?.outputData) {
          const d = crawlNode.outputData;
          responsePayload.crawlSummary = {
            pagesDiscovered: d.pagesDiscovered,
            actionsExecuted: d.actionsExecuted,
            stepsTaken: d.stepsTaken,
            transitionsRecorded: d.transitionsRecorded,
            knowledgeGraphStates: d.knowledgeGraphStates,
            success: d.success,
            ...(d.error ? { error: d.error } : {}),
          };
        }
    
        // Extract KG import result from knowledge_graph.import node (absent in older graph shapes)
        const kgNode = nodes.find(n => n.nodeType === 'knowledge_graph.import');
        if (kgNode?.outputData) {
          const d = kgNode.outputData;
          responsePayload.knowledgeGraph = {
            imported: !d.skipped,
            skipped: d.skipped ?? false,
            reason: d.reason ?? '',
            edgesImported: d.edgesImported ?? 0,
            statesImported: d.statesImported ?? 0,
            knowledgeGraphId: d.knowledgeGraphId ?? '',
            ...(Array.isArray(d.importErrors) && d.importErrors.length > 0 ? { importErrors: d.importErrors } : {}),
          };
        }
    
        logger.toolComplete('trigger_crawl', duration);
        // Bead 0bq: final progress is emitted INSIDE pollExecution's onUpdate when
        // terminal status is detected. Emitting it here would race the response
        // and could cause stale-progressToken transport tear-down.
    
        const sanitizedPayload = sanitizeResponseUrls(responsePayload, ctx);
        return {
          content: [{ type: 'text', text: JSON.stringify(sanitizedPayload, null, 2) }],
        };
      } catch (error) {
        const duration = Date.now() - startTime;
        logger.toolError('trigger_crawl', error as Error, duration);
        if (error instanceof Error && (error.message.includes('not found') || error.message.includes('401'))) {
          invalidateTemplateCache();
        }
        throw handleExternalServiceError(error, 'DebuggAI', 'crawl execution');
      } finally {
        process.stdin.removeListener('close', onStdinClose);
        // Tunnel intentionally NOT torn down (reuse path per bead vwd).
        // If tunnel creation failed after key provision, revoke the orphaned key.
        if (!ctx.tunnelId && keyId) {
          client.revokeNgrokKey(keyId).catch(err =>
            logger.warn(`Failed to revoke unused ngrok key ${keyId}: ${err}`),
          );
        }
      }
    }
  • Zod validation schema and TypeScript type for TriggerCrawlInput. Validates url (required, auto-normalized), projectUuid, environmentId, credentialId, credentialRole, username, password, headless, timeoutSeconds (max 1800), and repoName. Uses .strict() to reject unknown fields.
    export const TriggerCrawlInputSchema = z.object({
      url: z.preprocess(
        normalizeUrl,
        z.string().url('Invalid URL. Pass a full URL like "http://localhost:3000" or "https://example.com". Localhost URLs are auto-tunneled to the remote browser.'),
      ),
      projectUuid: z.string().uuid().optional(),
      environmentId: z.string().uuid().optional(),
      credentialId: z.string().uuid().optional(),
      credentialRole: z.string().optional(),
      username: z.string().optional(),
      password: z.string().optional(),
      headless: z.boolean().optional(),
      timeoutSeconds: z.number().int().positive().max(1800, 'timeoutSeconds cannot exceed 1800 (30 min)').optional(),
      repoName: z.string().optional(),
    }).strict();
    
    export type TriggerCrawlInput = z.infer<typeof TriggerCrawlInputSchema>;
  • tools/index.ts:24-58 (registration)
    Tool registration in initTools(). buildTriggerCrawlTool(ctx) registers the plain Tool for listTools, and buildValidatedTriggerCrawlTool(ctx) registers the validated tool with schema+handler in the toolRegistry map under the name 'trigger_crawl'.
    export function initTools(ctx: ProjectContext | null): void {
      const tools: Tool[] = [
        buildTestPageChangesTool(ctx),
        buildTriggerCrawlTool(ctx),
        buildProbePageTool(),
        buildSearchProjectsTool(),
        buildSearchEnvironmentsTool(),
        buildCreateEnvironmentTool(),
        buildUpdateEnvironmentTool(),
        buildDeleteEnvironmentTool(),
        buildUpdateProjectTool(),
        buildDeleteProjectTool(),
        buildSearchExecutionsTool(),
        buildCreateProjectTool(),
      ];
      const validated: ValidatedTool[] = [
        buildValidatedTestPageChangesTool(ctx),
        buildValidatedTriggerCrawlTool(ctx),
        buildValidatedProbePageTool(),
        buildValidatedSearchProjectsTool(),
        buildValidatedSearchEnvironmentsTool(),
        buildValidatedCreateEnvironmentTool(),
        buildValidatedUpdateEnvironmentTool(),
        buildValidatedDeleteEnvironmentTool(),
        buildValidatedUpdateProjectTool(),
        buildValidatedDeleteProjectTool(),
        buildValidatedSearchExecutionsTool(),
        buildValidatedCreateProjectTool(),
      ];
    
      _tools = tools;
      _validatedTools = validated;
    
      toolRegistry.clear();
      for (const v of validated) toolRegistry.set(v.name, v);
  • Tool definition builder that constructs the Tool object with name 'trigger_crawl', title 'Trigger App Crawl', description (enriched with project environments/credentials at startup), and input schema with all parameter descriptions.
    export function buildTriggerCrawlTool(ctx: ProjectContext | null): Tool {
      return {
        name: 'trigger_crawl',
        title: 'Trigger App Crawl',
        description: buildTriggerCrawlDescription(ctx),
        inputSchema: {
          type: 'object',
          properties: {
            url: {
              type: 'string',
              description: 'URL to crawl. Can be any public URL or a localhost/local dev server URL. For localhost URLs, a secure tunnel is automatically created — just make sure your dev server is running on that port.',
            },
            projectUuid: {
              type: 'string',
              description: 'UUID of the project whose knowledge graph the crawl should populate. Auto-detected from the current git repo if omitted.',
            },
            environmentId: {
              type: 'string',
              description: 'UUID of a specific environment to use for the crawl. See available environments in the tool description above.',
            },
            credentialId: {
              type: 'string',
              description: 'UUID of a specific credential for authenticated crawls. See available credentials in the tool description above.',
            },
            credentialRole: {
              type: 'string',
              description: "Pick a credential by role (e.g. 'admin', 'guest') from the resolved environment.",
            },
            username: {
              type: 'string',
              description: 'A real, existing account email for the target app. Do NOT invent credentials — use one from the available credentials or ask the user.',
            },
            password: {
              type: 'string',
              description: 'The real password for the username above. Do NOT guess.',
            },
            headless: {
              type: 'boolean',
              description: 'Run the browser in headless mode. Defaults to backend configuration.',
            },
            timeoutSeconds: {
              type: 'number',
              description: 'Maximum wall-time the crawl may run, in seconds (1..1800). Backend enforces this per workflow execution.',
            },
            repoName: {
              type: 'string',
              description: "GitHub repository name (e.g. 'my-org/my-repo'). Auto-detected from the current git repo — only provide this to run against a different project.",
            },
          },
          required: ['url'],
          additionalProperties: false,
        },
      };
    }
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries the full burden. It discloses that the crawl is long-running (minutes to tens of minutes), populates backend state asynchronously, and returns executionId+status+outcome (not pass/fail). It also covers localhost support with automatic tunneling. However, it does not explicitly address potential side effects or destructive actions, though the tool appears additive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections (first paragraph, LOCALHOST SUPPORT, WHEN TO USE, SCOPE). Every sentence provides useful information, and there is no redundancy or fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (10 parameters, no output schema), the description adequately explains the overall flow, return values, and asynchronous nature. It could mention error handling or what happens on failure, but it is largely complete for agent decision-making.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the baseline is 3. The tool description adds no new per-parameter information beyond what the schema already provides. It gives overall context but does not enhance parameter semantics.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's action ('Trigger a browser-agent crawl') and its purpose ('to build the project's knowledge graph'). It distinguishes from the sibling 'check_app_in_browser' by explicitly noting that it is NOT for per-change verification.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use ('after a significant new feature...') and when not to use ('NOT for per-change verification'), including the alternative tool name ('check_app_in_browser'). It also gives context on scope and typical duration.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/debugg-ai/debugg-ai-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server