<!--
Sync Impact Report
Version change: (unversioned) → 1.0.0
Modified principles: (initial population)
Added sections: Core Principles; Operational & Technical Constraints; Development Workflow & Quality Gates; Governance
Removed sections: None
Templates requiring updates:
- .specify/templates/plan-template.md ✅ (Constitution Check gates align automatically)
- .specify/templates/spec-template.md ✅ (User Scenarios & Requirements map to principles for testability & safety)
- .specify/templates/tasks-template.md ✅ (Task grouping supports independent testability & safety tasks)
- .specify/templates/checklist-template.md ✅ (Can generate Safety, Offline, Freshness categories)
- .specify/templates/agent-file-template.md ⚠ (Follow-up: auto-generation should list Active Principles summary)
Follow-up TODOs: None (no unresolved placeholders)
-->
# On-Call Runbook MCP Server Constitution
## Core Principles
### 1. Source-of-Truth Runbooks (Canonical & Fresh)
Runbook markdown files (with required frontmatter) ARE the single source of operational knowledge. A file without
valid frontmatter is ignored. Each response MUST cite concrete file paths and chunk identifiers. Any runbook with
`last_verified_at` older than freshness threshold (default 90 days) MUST trigger a staleness warning in answers.
Missing or malformed freshness dates MUST surface a validation error (never silently ignored).
Rationale: Ensures operators rely on auditable, dated content—reduces tribal knowledge drift.
### 2. Deterministic Retrieval & Non‑Hallucination (Safety Over Fluency)
Answer generation MUST always start from ranked chunks. If no sufficiently relevant chunk (score < policy threshold),
the system MUST return an explicit "UNKNOWN / ESCALATE" result with owner escalation info (from frontmatter) rather than
fabricating steps. LLM usage is OPTIONAL; offline fallback MUST yield extraction-based responses with identical safety
guards and citations.
Rationale: Guarantees reliability independent of external model availability and eliminates silent hallucinations.
### 3. Explicit Risk & Reversibility for Commands (NON‑NEGOTIABLE)
All extracted commands are classified as `safe_ops` or `risk_ops`. Every `risk_ops` item MUST include: (a) ⚠ marker, (b)
expected impact summary (derived or "UNSPECIFIED" marker), (c) rollback or a "VERIFY ROLLBACK MANUALLY" warning if none
is documented. Answers MUST segregate risk vs safe sections. A response containing risk operations without this structure
FAILS review gates.
Rationale: Minimizes operator error blast radius and enforces operational discipline.
### 4. Offline-First & Graceful Degradation
All core tools (search, read, checklist, commands, severity, handoff, postmortem) MUST function with zero network access
or missing API keys. Feature design MUST define exact degraded behaviors (e.g., omit summarization, still return raw
citations). No feature may block on LLM availability; latency targets are measured in offline mode.
Rationale: Incident response often occurs under constrained connectivity or revoked keys.
### 5. Modular Testable Core (Pure Logic Before Integration)
Parsing, chunking, scoring, rule aggregation, risk classification, freshness evaluation, and severity heuristics MUST be
implemented as pure functions with deterministic outputs (given static clock injection for freshness). Unit tests MUST
cover: positive path, no-match path, malformed runbook path. Integration tests MUST validate end‑to‑end `rb.answer`
flow with and without LLM. Performance budgets (<500ms local corpus 100 runbooks) MUST be benchmarked.
Rationale: Pure, testable units accelerate safe iteration and enable regression detection.
## Operational & Technical Constraints
1. Frontmatter REQUIRED fields: `title, service, component, severity_default, last_verified_at, owner_slack, owner_team`.
Missing field ⇒ file excluded + surfaced in validation report.
2. Optional arrays: `risk_ops[]`, `safe_ops[]` may be augmented by inline code fence extraction (deduplicate case-insensitive).
3. Path Security: Reject path traversal attempts (`..`, absolute drive changes) before filesystem access.
4. Freshness Threshold: Default 90 days; configurable per deployment; MUST be logged and enforced uniformly.
5. Performance: P95 local query (<100 runbooks, 2KB avg chunk) <500ms offline mode; track baseline before optimization.
6. Ranking: Initial set-overlap baseline; advanced modes (TF-IDF / embeddings) MUST preserve deterministic fallback order.
7. Conflict Resolution: If multiple runbooks match, select by: exact component match > service match > most recent
`last_verified_at`; ties resolved by lexicographical path.
8. Logging & Observability: All tool invocations MUST emit structured logs (timestamp, tool, latency, corpus size, result
classification: ANSWERED | ESCALATE | STALE | ERROR). No PII in logs.
9. Degradation Policy: Without API key ⇒ skip generative synthesis, still apply rule aggregation & safety annotations.
10. Versioning of Runbooks: Content-level breaking operational change MUST update `last_verified_at` and include a CHANGE NOTE
header section (`## Change Log`) with date + summary.
## Development Workflow & Quality Gates
Phases mirror task templates. Every feature plan MUST include a "Constitution Check" section explicitly confirming:
- P1: Source-of-Truth compliance (frontmatter schema impact? new required fields?)
- P2: Deterministic retrieval unaffected or changes to scoring documented with test cases
- P3: Risk annotation impacts (new command patterns? classification updates?)
- P4: Offline behavior defined (degraded output differences enumerated)
- P5: Pure function coverage list + target unit test files
- Performance: Estimated latency impact with mitigation if >20% of current baseline
- Freshness handling: Any alteration to threshold logic or override rationale
Pull Requests MUST include:
- Updated/added tests for each modified pure function
- An explicit statement: "NO HALLUCINATION RISK INTRODUCED" or rationale
- Benchmark delta (if touching retrieval logic)
Blocking Gates (must pass before merge):
1. Schema Validation: All example or new runbooks parse valid frontmatter.
2. Risk Guard: Any new command extractor pattern has classification tests.
3. Offline Simulation: Test run with `NO_LLM=1` passes.
4. Freshness Unit Tests: At least one stale and one fresh scenario.
5. Coverage: Core logic files ≥ baseline coverage (define when CI introduced).
## Governance
Scope & Supremacy: This constitution overrides conflicting ad-hoc practices. Principles 2 & 3 are NON‑NEGOTIABLE—cannot be
waived in individual PRs.
Versioning Policy (Semantic Governance Versioning):
- MAJOR: Removal or redefinition of a core principle; alteration that invalidates existing feature compliance.
- MINOR: Addition of a new principle, a new mandatory gate, or expanded constraint section.
- PATCH: Clarifications, wording precision, examples, or non-normative guidance.
Amendment Procedure:
1. Open proposal (PR) referencing current version and desired bump classification with rationale.
2. Provide diff summary (principle changes, added/removed constraints, migration steps if any).
3. Obtain approval from designated maintainers (minimum 2) or project owner if sole maintainer.
4. Update this file with Sync Impact Report HTML comment at top and adjust `Last Amended` date.
5. Cascade review: confirm templates still align; update gating language if needed.
Compliance Review:
- Automated lints (future CI) parse runbooks & ensure required fields.
- Nightly (or scheduled) freshness audit outputs list of stale runbooks; unresolved stale >30 days escalated to owner_team.
- Quarterly governance review: reassess latency target, freshness threshold, retrieval quality metrics.
Incident Learnings Integration:
- Postmortem templates MUST capture latency issues, misclassified risk_ops, hallucination incidents (if any) and feed back
into principle amendments (possible MINOR or PATCH bump).
Deviation Handling:
- Temporary deviations require an explicit EXPIRATION DATE and remediation task ID; tracked until resolved.
**Version**: 1.0.0 | **Ratified**: 2025-10-13 | **Last Amended**: 2025-10-13