2.5 KiB
ADR-0004 · GitOps as source of truth
Status: accepted Date: 2026-05
Context
Configuration of a Blocao site involves: OpenWrt UCI files, Frigate config, MQTT bridge policy, firewall rules, container compose files, retention policy, custom DNS lists. This needs to be:
- Auditable (who changed what, when, why).
- Reproducible (a new router should reach the same state as an existing one).
- Transactional (changes apply together or not at all).
- Reversible (rollback to last known good).
LuCI's web-based config edits and ad-hoc SSH changes don't satisfy any of those. Configuration management tools (Ansible, Puppet) work but introduce a layer of indirection between intent and state.
Decision
Two repositories per site: site-config (per-site overrides) and fleet-config (organization-wide common config). Configuration changes happen exclusively as commits to these repos.
The router clones both, reconciles every 5 minutes:
git fetchboth repos.- Detect drift (SHA256 of live files vs applied config).
- Apply layered:
fleet-configfirst,site-configoverrides. - If a change fails to apply, rollback to last known good and alert.
Drift detection: anything edited live (e.g. vi /etc/config/firewall outside the repo) is flagged in the UI as DRIFTED until either committed back to the repo or reverted.
The GitOps panel in the console exposes Applied/HEAD/Remote SHAs and lets the operator trigger fetch/reconcile/rollback. For deeper changes, the operator pushes commits via Gitea/GitHub UI or git CLI.
Consequences
Good:
- Anyone with read access to the repo can audit "what's running here".
- Rolling back a regression is a
git revert + reconcile. - Same operator experience scales from 1 site to 1000 sites.
- The fleet-config repo enables "edit once, apply to all" for org-wide policy changes.
Bad / trade-offs:
- Steeper learning curve for ops who don't know git. Mitigated by the GitOps panel handling common operations.
- 5-minute reconcile lag means urgent changes feel slow. Mitigated by the manual reconcile button.
- Secrets in repos are a problem — addressed by encrypting them with
sopsand decrypting at apply time.
Alternatives considered
- Ansible push: requires central control node, secrets management, doesn't give the audit trail in the same place as the code.
- Salt or Puppet: heavier than needed for our scale.
- Direct UCI edits via API: works for one-off changes but produces no history and no rollback.