T4 epic · SRE / on-call — three evidence sources gathered in parallel, a typed timeline, then the honest part:nika:waita settle window, re-poll the status API, andnika:assertrefuses to draft a « resolved » postmortem while prod still burns.on_finally:files the journal event and the ping NO MATTER WHAT.
The job
The hour after an incident is logs, Slack archaeology and a postmortem nobody wants to start. This pipeline assembles the evidence, rebuilds the timeline as typed events, proves recovery, and leaves a draft — summary, impact, hypotheses, actions — in the incidents folder before the retro is scheduled.The shape
The file
t4-incident-war-room.nika.yaml
How it works
Evidence gathers in one wave
logs (structured capture), status_history (with retry — status
pages flap during incidents) and runbook share no deps: one
parallel wave, three sources.Settle, recheck, PROVE
nika:wait duration: 60s gives the system a settle window, the
re-poll reads .current.state, and the assert fails the run —
loudly — if it isn’t operational. No optimistic postmortems.Constructs you just used
| Construct | Where | Reference |
|---|---|---|
capture: structured | logs | The 4 verbs |
nika:wait | settle | Builtins |
nika:assert recovery gate | confirmed | Builtins |
on_finally: | save | Workflows |
Make it yours
- Pull the incident channel export as a fourth evidence source and let the timeline cite humans, not just machines.
- Auto-file the retro:
nika:fetch method: POSTto your calendar/issue API with${{ tasks.timeline.output.events[0].at }}as the anchor. - Track MTTR over time: the
on_finallyevent stream is already your dataset.
Next · CEO Monday brief
The closer: a three-branch gather, jq arithmetic, a thinking
synthesis — and a run that reports its own bill.