Skip to main content
Glama

srt_generate_postmortem

Generate a structured postmortem report for completed SRT incidents, featuring timeline, root cause analysis, successes and failures, prevention actions, and key metrics such as TTD, TTDiag, and TTR.

Instructions

Generate a structured postmortem report for a completed SRT incident. Includes timeline, root cause, what worked/failed, prevention actions, metrics (TTD/TTDiag/TTR), and optional playbook delta. Classification: ADVISORY.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
incident_idYesIncident ID to generate postmortem for

Implementation Reference

  • The main handler function 'registerSRTGeneratePostmortemTool' that registers the srt_generate_postmortem MCP tool. It takes incident_id, builds a timeline, calculates metrics (TTD, TTDiag, TTR), identifies what worked/failed, suggests prevention actions, optionally creates a playbook delta, generates the postmortem report, persists the incident, and returns the structured postmortem.
    export function registerSRTGeneratePostmortemTool(server: McpServer, engine: GovernanceEngine): void {
      server.tool(
        'srt_generate_postmortem',
        'Generate a structured postmortem report for a completed SRT incident. Includes timeline, root cause, what worked/failed, prevention actions, metrics (TTD/TTDiag/TTR), and optional playbook delta. Classification: ADVISORY.',
        {
          incident_id: z.string().describe('Incident ID to generate postmortem for'),
        },
        { title: 'Generate Postmortem Report', readOnlyHint: false, idempotentHint: false, destructiveHint: false, openWorldHint: false },
        async (input) => {
          try {
            const incident = incidents.get(input.incident_id);
            if (!incident) {
              return { content: [{ type: 'text' as const, text: JSON.stringify({ error: 'INCIDENT_NOT_FOUND', incidentId: input.incident_id }) }], isError: true };
            }
    
            // Build timeline
            const timeline: Array<{ timestamp: string; agent: string; action: string; detail: string }> = [];
    
            if (incident.finding) {
              timeline.push({
                timestamp: incident.finding.timestamp,
                agent: 'watchdog',
                action: 'FINDING_EMITTED',
                detail: `${incident.finding.findingType}: ${incident.finding.signal} (${incident.finding.severity})`,
              });
            }
            if (incident.diagnosis) {
              timeline.push({
                timestamp: incident.diagnosis.timestamp,
                agent: 'diagnostician',
                action: 'DIAGNOSIS_COMPLETE',
                detail: `Root cause: ${incident.diagnosis.suspectedRootCause.substring(0, 200)}`,
              });
            }
            if (incident.repairPlan?.approvedAt) {
              timeline.push({
                timestamp: incident.repairPlan.approvedAt,
                agent: 'repair',
                action: 'REPAIR_APPROVED',
                detail: `Approved by ${incident.repairPlan.approvedBy}. Plan: ${incident.repairPlan.planId}`,
              });
            }
            if (incident.repairPlan?.completedAt) {
              timeline.push({
                timestamp: incident.repairPlan.completedAt,
                agent: 'repair',
                action: `REPAIR_${incident.repairPlan.result}`,
                detail: `Result: ${incident.repairPlan.result}. ${incident.repairPlan.commands.length} commands.`,
              });
            }
    
            // Calculate metrics
            // TTD  = Time-to-Detect: watchdog detection is instantaneous (finding created at incident creation)
            // TTDiag = Time-to-Diagnose: from incident creation to diagnosis completion
            // TTR  = Time-to-Repair: from diagnosis completion to repair completion
            const created = new Date(incident.createdAt).getTime();
            const findingTime = incident.finding?.timestamp ? new Date(incident.finding.timestamp).getTime() : created;
            const diagnosed = incident.diagnosis ? new Date(incident.diagnosis.timestamp).getTime() : created;
            const repaired = incident.repairPlan?.completedAt ? new Date(incident.repairPlan.completedAt).getTime() : diagnosed;
    
            const ttd = Math.max(0, Math.round((findingTime - created) / 60000));
            const ttdiag = Math.max(0, Math.round((diagnosed - findingTime) / 60000));
            const ttr = Math.max(0, Math.round((repaired - diagnosed) / 60000));
    
            const manualEstimate = incident.severity === 'CRITICAL' ? 90 : incident.severity === 'HIGH' ? 60 : 30;
            const humanTimeSaved = Math.max(0, manualEstimate - Math.round((ttd + ttdiag + ttr) * 0.1));
            const downtimeCost = incident.severity === 'CRITICAL' ? 100 : incident.severity === 'HIGH' ? 50 : 10;
            const costAvoided = humanTimeSaved * downtimeCost;
    
            // Similar incident count
            const similar = Array.from(incidents.values()).filter(i =>
              i.incidentId !== incident.incidentId &&
              i.finding?.findingType === incident.finding?.findingType
            );
    
            // What worked / failed
            const whatWorked: string[] = [];
            const whatFailed: string[] = [];
    
            if (incident.finding) whatWorked.push(`Watchdog detected ${incident.finding.findingType}`);
            if (incident.diagnosis?.confidence === 'high') whatWorked.push('Matched known playbook pattern');
            if (incident.diagnosis?.confidence === 'low') whatFailed.push('No matching playbook — needed manual investigation');
            if (incident.repairPlan?.result === 'SUCCESS') whatWorked.push('Repair executed successfully');
            if (incident.repairPlan?.result === 'FAILED') whatFailed.push('Repair failed — rollback required');
    
            // Prevention actions
            const preventionActions: string[] = [];
            const ft = incident.finding?.findingType;
            if (ft === 'DISK_PRESSURE') preventionActions.push('Set up weekly Docker image pruning', 'Add 70% disk alerting');
            else if (ft === 'TLS_EXPIRING') preventionActions.push('Configure certbot auto-renewal cron', 'Add 30-day cert monitoring');
            else if (ft === 'SERVICE_DOWN') preventionActions.push('Review container restart policies', 'Add health check alerting');
            else preventionActions.push('Monitor for recurrence', 'Consider dedicated health check for this failure mode');
    
            // Playbook delta if this was a new pattern
            let playbookDelta;
            if (incident.diagnosis?.confidence !== 'high' && incident.repairPlan?.result === 'SUCCESS') {
              playbookDelta = {
                deltaId: genId('DELTA'),
                targetPackId: 'incident-playbooks-v1',
                additions: {
                  patterns: [`${incident.finding?.findingType}_learned_${Date.now().toString(36)}`],
                  diagnosticSteps: incident.diagnosis?.actionsPerformed || [],
                  repairRecipes: incident.repairPlan?.commands.map(c => `Step ${c.step}: ${c.command}`) || [],
                },
                promotionStatus: 'PENDING — requires PLAYBOOK_PROMOTION gate (MANDATORY, ORG owner)',
              };
            }
    
            const postmortem = {
              postmortemId: genId('PM'),
              incidentId: incident.incidentId,
              title: `Incident: ${incident.finding?.findingType || 'Unknown'} — ${incident.severity}`,
              severity: incident.severity,
              timeline,
              rootCause: incident.diagnosis?.suspectedRootCause || 'Not determined',
              whatWorked,
              whatFailed,
              preventionActions,
              playbookDelta,
              metrics: {
                timeToDetectMinutes: ttd,
                timeToDiagnoseMinutes: ttdiag,
                timeToRepairMinutes: ttr,
                totalResolutionMinutes: ttd + ttdiag + ttr,
                humanTimeSavedMinutes: humanTimeSaved,
                costAvoidedUSD: costAvoided,
                recurrenceCount: similar.length,
              },
              timestamp: new Date().toISOString(),
            };
    
            // Update incident
            incident.postmortem = postmortem;
            incident.status = 'POSTMORTEM_GENERATED';
            incident.resolvedAt = new Date().toISOString();
            incident.updatedAt = new Date().toISOString();
            persistIncident(incident); // Write-through: postmortem + resolution
    
            engine.telemetryService.emitToolCall('srt_generate_postmortem', incident.incidentId, 'ADVISORY', true);
    
            return { content: [{ type: 'text' as const, text: JSON.stringify({
              postmortem,
              incidentResolved: true,
              status: 'POSTMORTEM_GENERATED',
            }, null, 2) }] };
          } catch (error) {
            engine.telemetryService.emitToolCall('srt_generate_postmortem', `postmortem-err-${Date.now().toString(36)}`, 'ADVISORY', false);
            return { content: [{ type: 'text' as const, text: JSON.stringify({ error: 'POSTMORTEM_FAILED', message: String(error) }) }], isError: true };
          }
        }
      );
  • The 'registerSRTTools' convenience function that registers all 4 SRT tools including srt_generate_postmortem with the MCP server.
    export function registerSRTTools(server: McpServer, engine: GovernanceEngine): void {
      registerSRTRunWatchdogTool(server, engine);
      registerSRTDiagnoseTool(server, engine);
      registerSRTApproveRepairTool(server, engine);
      registerSRTGeneratePostmortemTool(server, engine);
    }
  • Registration entry in the MCP server configuration that wires registerSRTTools (which includes srt_generate_postmortem) into the 'operator' tier.
    { tier: 'tenant', register: registerChainOfReasoningTools, description: 'chain_of_reasoning (Governed Cognition provenance trail)' },
    { tier: 'operator', register: registerSRTTools, description: 'srt (run_watchdog, diagnose, approve_repair, generate_postmortem)' },
    { tier: 'operator', register: registerRemediationPackTools, description: 'remediation (scan_environment, list_packs, dry_run_pack, apply_pack, run_patrol)' },
  • The SRTIncident interface that defines the postmortem shape (postmortemId, title, timeline, rootCause, whatWorked, whatFailed, preventionActions, metrics) used by the postmortem handler.
    interface SRTSuccessCriterion {
      check: string;
      expected: string;
      timeout: number;
    }
    
    interface SRTIncident {
      incidentId: string;
      status: string;
      severity: string;
      finding?: SRTFinding;
      diagnosis?: {
        diagnosisId: string;
        suspectedRootCause: string;
        confidence: string;
        actionsPerformed: string[];
        evidence: string[];
        fixOptions: Array<{ optionId: string; description: string; risk: string; estimatedMinutes: number; commands: string[]; rollback: string[]; recommended: boolean }>;
        riskAssessment: string;
        timestamp: string;
      };
      repairPlan?: {
        planId: string;
        reason: string;
        risk: string;
        commands: SRTRepairCommand[];
        rollback: SRTRepairCommand[];
        successCriteria: SRTSuccessCriterion[];
        estimatedMinutes: number;
        gateId: string;
        gateStatus: string;
        approvedBy?: string;
        approvedAt?: string;
        executedAt?: string;
        completedAt?: string;
        result?: string;
      };
      postmortem?: {
        postmortemId: string;
        title: string;
        timeline: Array<{ timestamp: string; agent: string; action: string; detail: string }>;
        rootCause: string;
        whatWorked: string[];
        whatFailed: string[];
        preventionActions: string[];
        metrics: {
          timeToDetectMinutes: number;
          timeToDiagnoseMinutes: number;
          timeToRepairMinutes: number;
          totalResolutionMinutes: number;
          humanTimeSavedMinutes: number;
          costAvoidedUSD: number;
          recurrenceCount: number;
        };
      };
      createdAt: string;
      updatedAt: string;
      resolvedAt?: string;
    }
  • The input schema for srt_generate_postmortem — takes a single required parameter: incident_id (string).
    {
      incident_id: z.string().describe('Incident ID to generate postmortem for'),
    },
    { title: 'Generate Postmortem Report', readOnlyHint: false, idempotentHint: false, destructiveHint: false, openWorldHint: false },
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations are present (readOnlyHint=false, destructiveHint=false, idempotentHint=false) but are generic. The description clarifies the output structure and that the classification is advisory, adding behavioral context. However, it does not disclose side effects (though likely none) or specify whether repeated calls produce the same report. The description complements annotations adequately.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loading the core purpose. Every piece of information (timeline, root cause, metrics, playbook delta) is valuable and succinct. No redundant or unnecessary text.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the single parameter and no output schema, the description covers the tool's purpose and output structure well. It lists all major sections of the report. However, it does not mention the output format (e.g., text, file, JSON) or any potential side effects (though none are expected). Slightly more detail on return value would raise completeness to 5.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has one parameter 'incident_id' with full description coverage (100%). The tool description repeats the schema description without adding extra meaning, so no value added beyond schema. Baseline score of 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Generate' and the resource 'structured postmortem report for a completed SRT incident'. It lists specific contents (timeline, root cause, etc.) and ends with classification 'ADVISORY', making it distinct from siblings like srt_diagnose or srt_approve_repair.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description specifies 'for a completed SRT incident', implying it should be used only after an incident is resolved. However, it does not explicitly state when not to use it, nor does it mention alternative tools for ongoing incidents (e.g., srt_diagnose) or other conditions. This leaves some ambiguity for the agent.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/knowledgepa3/gia-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server