# Runbooks Operational runbooks for common incidents. **Currently placeholders** — to be written as the platform reaches production. Planned runbooks: - `bridge-down.md` — when the MQTT bridge to hub fails to connect. - `gitops-conflict.md` — when reconcile fails due to merge conflict. - `cell-disk-critical.md` — when Cell disk usage exceeds 95%. - `camera-mass-offline.md` — when multiple cameras go offline simultaneously. - `wan-failover.md` — when primary WAN goes down and failover engages. - `cert-renewal-needed.md` — mTLS cert nearing expiration. - `frigate-crash-loop.md` — when Frigate fails to start repeatedly. - `evidence-export.md` — step-by-step for an operator to export an evidence pack for legal counsel. - `site-recovery.md` — full recovery from a Cell hardware failure. - `password-recovery.md` — operator forgot password, recovery procedures. Each runbook follows a standard structure: 1. **Symptoms** — what you see that triggers this runbook. 2. **Severity** — how urgent. 3. **Triage** — first 60 seconds: what to check before acting. 4. **Resolution** — step-by-step. 5. **Verification** — how to confirm the fix. 6. **Postmortem checklist** — what to document if the issue recurs. 7. **See also** — related runbooks, ADRs, docs. Until these are written, on-call engineers will work from architecture docs + their own judgment. The first runbook to be written should be `bridge-down.md` since it's the most common foreseeable issue.