Operations Runbook
Failures are visible, explainable, and recoverable.
Failure flow
flowchart TD
A[Job triggered] --> B{Gate pass?}
B -- Yes --> C[Publish pack + manifest]
B -- No --> D[Record incident]
D --> E[Notify channels]
E --> F[Open recovery item]
F --> G[Retry with idempotency key]
Incident sequence
- Identify failing gate and impacted locality scope.
- Pause publish for affected scope if certainty drops.
- Capture audit evidence and open tracked incident.
- Apply fix with explicit rollback step.
- Re-run targeted checks and close only on proof.
Rollback checklist
| Step | Action | Evidence |
|---|---|---|
| 1 | Rebind manifest to last stable pack | publisher event log |
| 2 | Confirm integrity hash alignment | client sync output |
| 3 | Trigger canary run | workflow trace |
| 4 | Attach postmortem note | incident archive |
Whisper-audit wording
[Intention: Amanah] Publish remains paused for this scope until gate confidence recovers and rollback evidence is complete.