srt_generate_postmortem
Generate a structured postmortem report for completed SRT incidents, featuring timeline, root cause analysis, successes and failures, prevention actions, and key metrics such as TTD, TTDiag, and TTR.
Instructions
Generate a structured postmortem report for a completed SRT incident. Includes timeline, root cause, what worked/failed, prevention actions, metrics (TTD/TTDiag/TTR), and optional playbook delta. Classification: ADVISORY.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| incident_id | Yes | Incident ID to generate postmortem for |
Implementation Reference
- src/mcp/tools/srt.ts:981-1131 (handler)The main handler function 'registerSRTGeneratePostmortemTool' that registers the srt_generate_postmortem MCP tool. It takes incident_id, builds a timeline, calculates metrics (TTD, TTDiag, TTR), identifies what worked/failed, suggests prevention actions, optionally creates a playbook delta, generates the postmortem report, persists the incident, and returns the structured postmortem.
export function registerSRTGeneratePostmortemTool(server: McpServer, engine: GovernanceEngine): void { server.tool( 'srt_generate_postmortem', 'Generate a structured postmortem report for a completed SRT incident. Includes timeline, root cause, what worked/failed, prevention actions, metrics (TTD/TTDiag/TTR), and optional playbook delta. Classification: ADVISORY.', { incident_id: z.string().describe('Incident ID to generate postmortem for'), }, { title: 'Generate Postmortem Report', readOnlyHint: false, idempotentHint: false, destructiveHint: false, openWorldHint: false }, async (input) => { try { const incident = incidents.get(input.incident_id); if (!incident) { return { content: [{ type: 'text' as const, text: JSON.stringify({ error: 'INCIDENT_NOT_FOUND', incidentId: input.incident_id }) }], isError: true }; } // Build timeline const timeline: Array<{ timestamp: string; agent: string; action: string; detail: string }> = []; if (incident.finding) { timeline.push({ timestamp: incident.finding.timestamp, agent: 'watchdog', action: 'FINDING_EMITTED', detail: `${incident.finding.findingType}: ${incident.finding.signal} (${incident.finding.severity})`, }); } if (incident.diagnosis) { timeline.push({ timestamp: incident.diagnosis.timestamp, agent: 'diagnostician', action: 'DIAGNOSIS_COMPLETE', detail: `Root cause: ${incident.diagnosis.suspectedRootCause.substring(0, 200)}`, }); } if (incident.repairPlan?.approvedAt) { timeline.push({ timestamp: incident.repairPlan.approvedAt, agent: 'repair', action: 'REPAIR_APPROVED', detail: `Approved by ${incident.repairPlan.approvedBy}. Plan: ${incident.repairPlan.planId}`, }); } if (incident.repairPlan?.completedAt) { timeline.push({ timestamp: incident.repairPlan.completedAt, agent: 'repair', action: `REPAIR_${incident.repairPlan.result}`, detail: `Result: ${incident.repairPlan.result}. ${incident.repairPlan.commands.length} commands.`, }); } // Calculate metrics // TTD = Time-to-Detect: watchdog detection is instantaneous (finding created at incident creation) // TTDiag = Time-to-Diagnose: from incident creation to diagnosis completion // TTR = Time-to-Repair: from diagnosis completion to repair completion const created = new Date(incident.createdAt).getTime(); const findingTime = incident.finding?.timestamp ? new Date(incident.finding.timestamp).getTime() : created; const diagnosed = incident.diagnosis ? new Date(incident.diagnosis.timestamp).getTime() : created; const repaired = incident.repairPlan?.completedAt ? new Date(incident.repairPlan.completedAt).getTime() : diagnosed; const ttd = Math.max(0, Math.round((findingTime - created) / 60000)); const ttdiag = Math.max(0, Math.round((diagnosed - findingTime) / 60000)); const ttr = Math.max(0, Math.round((repaired - diagnosed) / 60000)); const manualEstimate = incident.severity === 'CRITICAL' ? 90 : incident.severity === 'HIGH' ? 60 : 30; const humanTimeSaved = Math.max(0, manualEstimate - Math.round((ttd + ttdiag + ttr) * 0.1)); const downtimeCost = incident.severity === 'CRITICAL' ? 100 : incident.severity === 'HIGH' ? 50 : 10; const costAvoided = humanTimeSaved * downtimeCost; // Similar incident count const similar = Array.from(incidents.values()).filter(i => i.incidentId !== incident.incidentId && i.finding?.findingType === incident.finding?.findingType ); // What worked / failed const whatWorked: string[] = []; const whatFailed: string[] = []; if (incident.finding) whatWorked.push(`Watchdog detected ${incident.finding.findingType}`); if (incident.diagnosis?.confidence === 'high') whatWorked.push('Matched known playbook pattern'); if (incident.diagnosis?.confidence === 'low') whatFailed.push('No matching playbook — needed manual investigation'); if (incident.repairPlan?.result === 'SUCCESS') whatWorked.push('Repair executed successfully'); if (incident.repairPlan?.result === 'FAILED') whatFailed.push('Repair failed — rollback required'); // Prevention actions const preventionActions: string[] = []; const ft = incident.finding?.findingType; if (ft === 'DISK_PRESSURE') preventionActions.push('Set up weekly Docker image pruning', 'Add 70% disk alerting'); else if (ft === 'TLS_EXPIRING') preventionActions.push('Configure certbot auto-renewal cron', 'Add 30-day cert monitoring'); else if (ft === 'SERVICE_DOWN') preventionActions.push('Review container restart policies', 'Add health check alerting'); else preventionActions.push('Monitor for recurrence', 'Consider dedicated health check for this failure mode'); // Playbook delta if this was a new pattern let playbookDelta; if (incident.diagnosis?.confidence !== 'high' && incident.repairPlan?.result === 'SUCCESS') { playbookDelta = { deltaId: genId('DELTA'), targetPackId: 'incident-playbooks-v1', additions: { patterns: [`${incident.finding?.findingType}_learned_${Date.now().toString(36)}`], diagnosticSteps: incident.diagnosis?.actionsPerformed || [], repairRecipes: incident.repairPlan?.commands.map(c => `Step ${c.step}: ${c.command}`) || [], }, promotionStatus: 'PENDING — requires PLAYBOOK_PROMOTION gate (MANDATORY, ORG owner)', }; } const postmortem = { postmortemId: genId('PM'), incidentId: incident.incidentId, title: `Incident: ${incident.finding?.findingType || 'Unknown'} — ${incident.severity}`, severity: incident.severity, timeline, rootCause: incident.diagnosis?.suspectedRootCause || 'Not determined', whatWorked, whatFailed, preventionActions, playbookDelta, metrics: { timeToDetectMinutes: ttd, timeToDiagnoseMinutes: ttdiag, timeToRepairMinutes: ttr, totalResolutionMinutes: ttd + ttdiag + ttr, humanTimeSavedMinutes: humanTimeSaved, costAvoidedUSD: costAvoided, recurrenceCount: similar.length, }, timestamp: new Date().toISOString(), }; // Update incident incident.postmortem = postmortem; incident.status = 'POSTMORTEM_GENERATED'; incident.resolvedAt = new Date().toISOString(); incident.updatedAt = new Date().toISOString(); persistIncident(incident); // Write-through: postmortem + resolution engine.telemetryService.emitToolCall('srt_generate_postmortem', incident.incidentId, 'ADVISORY', true); return { content: [{ type: 'text' as const, text: JSON.stringify({ postmortem, incidentResolved: true, status: 'POSTMORTEM_GENERATED', }, null, 2) }] }; } catch (error) { engine.telemetryService.emitToolCall('srt_generate_postmortem', `postmortem-err-${Date.now().toString(36)}`, 'ADVISORY', false); return { content: [{ type: 'text' as const, text: JSON.stringify({ error: 'POSTMORTEM_FAILED', message: String(error) }) }], isError: true }; } } ); - src/mcp/tools/srt.ts:1141-1146 (registration)The 'registerSRTTools' convenience function that registers all 4 SRT tools including srt_generate_postmortem with the MCP server.
export function registerSRTTools(server: McpServer, engine: GovernanceEngine): void { registerSRTRunWatchdogTool(server, engine); registerSRTDiagnoseTool(server, engine); registerSRTApproveRepairTool(server, engine); registerSRTGeneratePostmortemTool(server, engine); } - src/mcp/server.ts:120-122 (registration)Registration entry in the MCP server configuration that wires registerSRTTools (which includes srt_generate_postmortem) into the 'operator' tier.
{ tier: 'tenant', register: registerChainOfReasoningTools, description: 'chain_of_reasoning (Governed Cognition provenance trail)' }, { tier: 'operator', register: registerSRTTools, description: 'srt (run_watchdog, diagnose, approve_repair, generate_postmortem)' }, { tier: 'operator', register: registerRemediationPackTools, description: 'remediation (scan_environment, list_packs, dry_run_pack, apply_pack, run_patrol)' }, - src/mcp/tools/srt.ts:69-127 (helper)The SRTIncident interface that defines the postmortem shape (postmortemId, title, timeline, rootCause, whatWorked, whatFailed, preventionActions, metrics) used by the postmortem handler.
interface SRTSuccessCriterion { check: string; expected: string; timeout: number; } interface SRTIncident { incidentId: string; status: string; severity: string; finding?: SRTFinding; diagnosis?: { diagnosisId: string; suspectedRootCause: string; confidence: string; actionsPerformed: string[]; evidence: string[]; fixOptions: Array<{ optionId: string; description: string; risk: string; estimatedMinutes: number; commands: string[]; rollback: string[]; recommended: boolean }>; riskAssessment: string; timestamp: string; }; repairPlan?: { planId: string; reason: string; risk: string; commands: SRTRepairCommand[]; rollback: SRTRepairCommand[]; successCriteria: SRTSuccessCriterion[]; estimatedMinutes: number; gateId: string; gateStatus: string; approvedBy?: string; approvedAt?: string; executedAt?: string; completedAt?: string; result?: string; }; postmortem?: { postmortemId: string; title: string; timeline: Array<{ timestamp: string; agent: string; action: string; detail: string }>; rootCause: string; whatWorked: string[]; whatFailed: string[]; preventionActions: string[]; metrics: { timeToDetectMinutes: number; timeToDiagnoseMinutes: number; timeToRepairMinutes: number; totalResolutionMinutes: number; humanTimeSavedMinutes: number; costAvoidedUSD: number; recurrenceCount: number; }; }; createdAt: string; updatedAt: string; resolvedAt?: string; } - src/mcp/tools/srt.ts:985-988 (schema)The input schema for srt_generate_postmortem — takes a single required parameter: incident_id (string).
{ incident_id: z.string().describe('Incident ID to generate postmortem for'), }, { title: 'Generate Postmortem Report', readOnlyHint: false, idempotentHint: false, destructiveHint: false, openWorldHint: false },