Skip to main content
Glama

forensics

Close the learning loop on failed agent runs by collecting failure data, recording replays, and annotating outcomes for continuous improvement.

Instructions

Failure dataset & replays — close the learning loop on failed agent runs.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
actionYes
projectRootNo
replayIdNo
payloadNoFor action=record-replay: the Task() payload to store.
outcomeNoFor action=annotate-replay
agentNoFor action=reflect: agent name (e.g. executor)
dryRunNoFor action=reflect: only save the assembled prompt, no API call

Implementation Reference

  • MCP tool schema/definition for the 'forensics' tool. Defines the tool name, description, and inputSchema with actions (collect, summarize, write-learnings, list-replays, record-replay, load-replay, annotate-replay, reflect) and parameters.
    {
      name: 'forensics',
      description: 'Failure dataset & replays — close the learning loop on failed agent runs.',
      inputSchema: {
        type: 'object',
        properties: {
          action:      { type: 'string', enum: ['collect', 'summarize', 'write-learnings', 'list-replays', 'record-replay', 'load-replay', 'annotate-replay', 'reflect'] },
          projectRoot: { type: 'string' },
          replayId:    { type: 'string' },
          payload:     { type: 'object', description: 'For action=record-replay: the Task() payload to store.' },
          outcome:     { type: 'object', description: 'For action=annotate-replay' },
          agent:       { type: 'string', description: 'For action=reflect: agent name (e.g. executor)' },
          dryRun:      { type: 'boolean', description: 'For action=reflect: only save the assembled prompt, no API call' },
        },
        required: ['action'],
      },
    },
  • The MCP handler function (handleForensics) that dispatches forensics actions to the core logic functions like collectFailures, summarizeByAgent, writeLearnings, listReplays, recordReplay, loadReplay, annotateReplay, and reflect.
    async function handleForensics(args) {
      const projectRoot = args.projectRoot;
      switch (args.action) {
        case 'collect':         return collectFailures({ projectRoot });
        case 'summarize': {
          const failures = await collectFailures({ projectRoot });
          return summarizeByAgent(failures);
        }
        case 'write-learnings': {
          const failures = await collectFailures({ projectRoot });
          return writeLearnings(failures, { projectRoot });
        }
        case 'list-replays':    return listReplays({ projectRoot });
        case 'record-replay':   return recordReplay(args.payload, { projectRoot });
        case 'load-replay':     return loadReplay(args.replayId, { projectRoot });
        case 'annotate-replay': return annotateReplay(args.replayId, args.outcome, { projectRoot });
        case 'reflect': return reflect({
          agent: args.agent, projectRoot, dryRun: args.dryRun,
          apply: false, interactive: false,  // MCP never auto-applies
        });
        default: return { error: `Unknown action: ${args.action}` };
      }
    }
  • The HANDLERS map that registers 'forensics' -> handleForensics, so the CallToolRequestSchema dispatcher routes the forensics tool to its handler.
    const HANDLERS = {
      kit:           handleKit,
      sync:          handleSync,
      'reverse-sync':handleReverseSync,
      gates:         handleGates,
      forensics:     handleForensics,
      install:       handleInstall,
    };
  • Core helper collectFailures() - aggregates debug sessions, failed verifications, and forensics reports into a structured dataset. Used by the 'collect', 'summarize', and 'write-learnings' actions.
    export async function collectFailures(opts = {}) {
      const projectRoot = path.resolve(opts.projectRoot ?? process.cwd());
      const planning    = path.join(projectRoot, '.planning');
    
      const [debugFailures, verifyFailures, forensicsReports] = await Promise.all([
        readDebugSessions(path.join(planning, 'debug', 'resolved')),
        readFailedVerifications(path.join(planning, 'phases')),
        readForensics(path.join(planning, 'forensics')),
      ]);
    
      return {
        projectRoot,
        counts: {
          debug:     debugFailures.length,
          verify:    verifyFailures.length,
          forensics: forensicsReports.length,
        },
        items: [...debugFailures, ...verifyFailures, ...forensicsReports],
      };
    }
  • Core helper functions (recordReplay, listReplays, loadReplay, annotateReplay) for managing replay data - used by the replay-related forensics actions.
    export async function recordReplay(payload, opts = {}) {
      const projectRoot = path.resolve(opts.projectRoot ?? process.cwd());
      const dir = path.join(projectRoot, REPLAY_DIR_REL);
      await fs.mkdir(dir, { recursive: true });
    
      const ts   = new Date().toISOString().replace(/[:.]/g, '-');
      const slug = [payload.phase, payload.plan, payload.agent].filter(Boolean).join('-') || 'unknown';
      const id   = `${ts}-${slug}`;
      const file = path.join(dir, `${id}.json`);
    
      const record = { id, recorded_at: new Date().toISOString(), ...payload };
      await fs.writeFile(file, JSON.stringify(record, null, 2), 'utf8');
      return { id, file, record };
    }
    
    export async function listReplays(opts = {}) {
      const projectRoot = path.resolve(opts.projectRoot ?? process.cwd());
      const dir = path.join(projectRoot, REPLAY_DIR_REL);
      let entries;
      try { entries = await fs.readdir(dir); } catch { return []; }
      const items = [];
      for (const e of entries) {
        if (!e.endsWith('.json')) continue;
        try {
          const r = JSON.parse(await fs.readFile(path.join(dir, e), 'utf8'));
          items.push({ id: r.id, agent: r.agent, phase: r.phase, plan: r.plan, recorded_at: r.recorded_at });
        } catch {}
      }
      return items.sort((a, b) => (b.recorded_at ?? '').localeCompare(a.recorded_at ?? ''));
    }
    
    export async function loadReplay(id, opts = {}) {
      const projectRoot = path.resolve(opts.projectRoot ?? process.cwd());
      const file = path.join(projectRoot, REPLAY_DIR_REL, `${id}.json`);
      const raw  = await fs.readFile(file, 'utf8');
      return JSON.parse(raw);
    }
    
    export async function annotateReplay(id, outcome, opts = {}) {
      const projectRoot = path.resolve(opts.projectRoot ?? process.cwd());
      const file = path.join(projectRoot, REPLAY_DIR_REL, `${id}.json`);
      const r = JSON.parse(await fs.readFile(file, 'utf8'));
      r.outcome = { ...(r.outcome ?? {}), ...outcome, annotated_at: new Date().toISOString() };
      await fs.writeFile(file, JSON.stringify(r, null, 2), 'utf8');
      return r;
    }
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations exist, so the description carries full burden. It mentions 'failure dataset & replays' but does not disclose behavioral traits like whether it modifies state, requires authentication, or has side effects. The schema includes actions like 'write-learnings' implying mutations, but descriptions are silent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

A single sentence is concise but lacks structure. It does not front-load key information or use bullet points, but it is not verbose. Adequate yet minimal.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (7 parameters, many actions, nested objects), the description is severely incomplete. It does not explain actions, how to use replayId, or return values. No output schema further reduces completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 57%, with some parameters described (payload, outcome, agent, dryRun) but action and projectRoot lack descriptions. The description adds no meaning beyond the schema, failing to compensate for missing param details.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description 'Failure dataset & replays — close the learning loop on failed agent runs' suggests a general purpose related to collecting and replaying failures, but lacks a specific verb and resource. It does not clearly distinguish from siblings, though siblings are unrelated.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives, nor any conditions or exclusions. The description does not help an agent decide when to invoke this tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/luanpdd/kit-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server