diff --git a/decisions/0004-gitops-como-source-of-truth.md b/decisions/0004-gitops-como-source-of-truth.md new file mode 100644 index 0000000..78ce3e8 --- /dev/null +++ b/decisions/0004-gitops-como-source-of-truth.md @@ -0,0 +1,49 @@ +# ADR-0004 · GitOps as source of truth + +**Status**: accepted +**Date**: 2026-05 + +## Context + +Configuration of a Blocao site involves: OpenWrt UCI files, Frigate config, MQTT bridge policy, firewall rules, container compose files, retention policy, custom DNS lists. This needs to be: + +- Auditable (who changed what, when, why). +- Reproducible (a new router should reach the same state as an existing one). +- Transactional (changes apply together or not at all). +- Reversible (rollback to last known good). + +LuCI's web-based config edits and ad-hoc SSH changes don't satisfy any of those. Configuration management tools (Ansible, Puppet) work but introduce a layer of indirection between intent and state. + +## Decision + +**Two repositories per site**: `site-config` (per-site overrides) and `fleet-config` (organization-wide common config). Configuration changes happen exclusively as commits to these repos. + +The router clones both, reconciles every 5 minutes: + +1. `git fetch` both repos. +2. Detect drift (SHA256 of live files vs applied config). +3. Apply layered: `fleet-config` first, `site-config` overrides. +4. If a change fails to apply, rollback to last known good and alert. + +**Drift detection**: anything edited live (e.g. `vi /etc/config/firewall` outside the repo) is flagged in the UI as `DRIFTED` until either committed back to the repo or reverted. + +The **GitOps panel** in the console exposes Applied/HEAD/Remote SHAs and lets the operator trigger fetch/reconcile/rollback. For deeper changes, the operator pushes commits via Gitea/GitHub UI or git CLI. + +## Consequences + +**Good**: +- Anyone with read access to the repo can audit "what's running here". +- Rolling back a regression is a `git revert + reconcile`. +- Same operator experience scales from 1 site to 1000 sites. +- The fleet-config repo enables "edit once, apply to all" for org-wide policy changes. + +**Bad / trade-offs**: +- Steeper learning curve for ops who don't know git. Mitigated by the GitOps panel handling common operations. +- 5-minute reconcile lag means urgent changes feel slow. Mitigated by the **manual reconcile button**. +- Secrets in repos are a problem — addressed by encrypting them with `sops` and decrypting at apply time. + +## Alternatives considered + +- **Ansible push**: requires central control node, secrets management, doesn't give the audit trail in the same place as the code. +- **Salt or Puppet**: heavier than needed for our scale. +- **Direct UCI edits via API**: works for one-off changes but produces no history and no rollback.