srt_diagnose
Analyze an incident by matching findings to known playbooks, identifying root cause, and proposing a staged repair plan.
Instructions
Run the SRT Diagnostician on an incident. Matches finding to known playbooks, identifies root cause, and proposes a staged repair plan. Classification: ADVISORY — read-only analysis, no mutations.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| incident_id | Yes | Incident ID from watchdog finding | |
| additional_observations | No | Additional diagnostic observations (e.g. from manual log reading) |
Implementation Reference
- src/mcp/tools/srt.ts:708-845 (handler)Main handler for srt_diagnose tool. Accepts incident_id (and optional additional_observations), looks up the incident, matches it to a known playbook (e.g., nginx_502_upstream_unhealthy, database_unreachable, ssh_hardening_required), generates a diagnosis with suspected root cause, confidence level, fix options, and a repair plan with commands, rollback, and success criteria. Emits telemetry. Throws DIAGNOSE_FAILED on error.
export function registerSRTDiagnoseTool(server: McpServer, engine: GovernanceEngine): void { server.tool( 'srt_diagnose', 'Run the SRT Diagnostician on an incident. Matches finding to known playbooks, identifies root cause, and proposes a staged repair plan. Classification: ADVISORY — read-only analysis, no mutations.', { incident_id: z.string().describe('Incident ID from watchdog finding'), additional_observations: z.array(z.string()).optional().describe('Additional diagnostic observations (e.g. from manual log reading)'), }, { title: 'Diagnose Incident', readOnlyHint: false, idempotentHint: false, destructiveHint: false, openWorldHint: false }, async (input) => { try { const incident = incidents.get(input.incident_id); if (!incident) { return { content: [{ type: 'text' as const, text: JSON.stringify({ error: 'INCIDENT_NOT_FOUND', incidentId: input.incident_id, available: Array.from(incidents.keys()), }) }], isError: true }; } if (!incident.finding) { return { content: [{ type: 'text' as const, text: JSON.stringify({ error: 'NO_FINDING', message: 'Incident has no watchdog finding' }) }], isError: true }; } incident.status = 'DIAGNOSING'; incident.updatedAt = new Date().toISOString(); // Match playbook const pb = matchPlaybook( incident.finding.findingType, incident.finding.signal, incident.finding.observations, ); const diagnosisId = genId('DIAG'); const actionsPerformed = ['INSPECT_CONTAINER', 'CHECK_ENV', 'VALIDATE_CONFIG']; if (['DB_UNREACHABLE'].includes(incident.finding.findingType)) actionsPerformed.push('CHECK_CONNECTIVITY'); if (['TLS_EXPIRING', 'DNS_FAILURE'].includes(incident.finding.findingType)) actionsPerformed.push('CHECK_TLS', 'CHECK_DNS'); const suspectedRootCause = pb ? `Matched playbook: ${pb.pattern}. Known pattern with ${pb.risk} risk repair.` : `Unknown pattern: ${incident.finding.findingType}. Signal: ${incident.finding.signal}. Manual investigation recommended.`; const confidence = pb ? 'high' : 'low'; // Build fix options const fixOptions: Array<{ optionId: string; description: string; risk: string; estimatedMinutes: number; commands: string[]; rollback: string[]; recommended: boolean }> = []; if (pb) { fixOptions.push({ optionId: genId('FIX'), description: `Playbook: ${pb.pattern}`, risk: pb.risk, estimatedMinutes: pb.estimatedMinutes, commands: pb.commands.map(c => c.command), rollback: pb.rollback.map(c => c.command), recommended: true, }); } fixOptions.push({ optionId: genId('FIX'), description: 'Full stack restart', risk: 'MEDIUM', estimatedMinutes: 10, commands: ['docker compose down', 'sleep 5', 'docker compose up -d'], rollback: [], recommended: !pb, }); // Build repair plan const selectedFix = fixOptions.find(f => f.recommended) || fixOptions[0]; const commands = pb ? pb.commands : selectedFix.commands.map((cmd, i) => ({ step: i + 1, command: cmd, description: cmd, timeout: 120, requiresElevation: false, sensitive: false, })); const rollback = pb ? pb.rollback : []; const successCriteria = pb ? pb.successCriteria : [{ check: 'System healthy', expected: 'true', timeout: 60 }]; const planId = genId('REPAIR'); const gateId = genId('GATE'); incident.diagnosis = { diagnosisId, suspectedRootCause, confidence, actionsPerformed, evidence: [ ...incident.finding.observations, ...(input.additional_observations || []), ], fixOptions, riskAssessment: pb ? `Known pattern. ${pb.risk} risk. ${pb.diagnosticSteps.length} diagnostic steps matched.` : 'Unknown pattern. Elevated risk. Conservative approach recommended.', timestamp: new Date().toISOString(), }; incident.repairPlan = { planId, reason: suspectedRootCause, risk: selectedFix.risk, commands: commands as SRTRepairCommand[], rollback: rollback as SRTRepairCommand[], successCriteria, estimatedMinutes: selectedFix.estimatedMinutes, gateId, gateStatus: 'PENDING', }; incident.status = 'REPAIR_PROPOSED'; incident.updatedAt = new Date().toISOString(); persistIncident(incident); // Write-through: diagnosis + repair plan engine.telemetryService.emitToolCall('srt_diagnose', incident.incidentId, 'ADVISORY', true); return { content: [{ type: 'text' as const, text: JSON.stringify({ diagnosed: true, incidentId: incident.incidentId, diagnosis: incident.diagnosis, repairPlan: { planId: incident.repairPlan.planId, reason: incident.repairPlan.reason, risk: incident.repairPlan.risk, commands: incident.repairPlan.commands, rollback: incident.repairPlan.rollback, successCriteria: incident.repairPlan.successCriteria, estimatedMinutes: incident.repairPlan.estimatedMinutes, gateId: incident.repairPlan.gateId, gateStatus: 'PENDING — requires MANDATORY human approval', }, nextStep: 'Call srt_approve_repair to approve or reject this plan. Repair CANNOT execute without human approval.', }, null, 2) }] }; } catch (error) { engine.telemetryService.emitToolCall('srt_diagnose', `diag-err-${Date.now().toString(36)}`, 'ADVISORY', false); return { content: [{ type: 'text' as const, text: JSON.stringify({ error: 'DIAGNOSE_FAILED', message: String(error) }) }], isError: true }; } } ); } - src/mcp/tools/srt.ts:712-715 (schema)Zod schema for srt_diagnose input: required 'incident_id' string and optional 'additional_observations' string array.
{ incident_id: z.string().describe('Incident ID from watchdog finding'), additional_observations: z.array(z.string()).optional().describe('Additional diagnostic observations (e.g. from manual log reading)'), }, - src/mcp/tools/srt.ts:1141-1146 (registration)Convenience registration function that registers all 4 SRT tools. Called from server.ts via import { registerSRTTools } from './tools/srt.js'.
export function registerSRTTools(server: McpServer, engine: GovernanceEngine): void { registerSRTRunWatchdogTool(server, engine); registerSRTDiagnoseTool(server, engine); registerSRTApproveRepairTool(server, engine); registerSRTGeneratePostmortemTool(server, engine); } - src/mcp/server.ts:121-122 (registration)SRT tool registration entry in the server's tool visibility configuration. Tier is 'operator' (internal infrastructure), not exposed to external clients.
{ tier: 'operator', register: registerSRTTools, description: 'srt (run_watchdog, diagnose, approve_repair, generate_postmortem)' }, { tier: 'operator', register: registerRemediationPackTools, description: 'remediation (scan_environment, list_packs, dry_run_pack, apply_pack, run_patrol)' }, - src/mcp/tools/srt.ts:391-398 (helper)Helper function used by srt_diagnose handler to match an incident's finding type, signal, and observations against known playbooks. Returns the first matching playbook or null.
function matchPlaybook(findingType: string, signal: string, observations: string[]): Playbook | null { const combined = `${findingType} ${signal} ${observations.join(' ')}`.toLowerCase(); for (const pb of PLAYBOOKS) { if (pb.matchTypes.includes(findingType)) return pb; if (pb.matchSignals.some(s => combined.includes(s))) return pb; } return null; }