Files
wdmUI/decisions/0004-gitops-como-source-of-truth.md
T

2.5 KiB

ADR-0004 · GitOps as source of truth

Status: accepted Date: 2026-05

Context

Configuration of a Blocao site involves: OpenWrt UCI files, Frigate config, MQTT bridge policy, firewall rules, container compose files, retention policy, custom DNS lists. This needs to be:

  • Auditable (who changed what, when, why).
  • Reproducible (a new router should reach the same state as an existing one).
  • Transactional (changes apply together or not at all).
  • Reversible (rollback to last known good).

LuCI's web-based config edits and ad-hoc SSH changes don't satisfy any of those. Configuration management tools (Ansible, Puppet) work but introduce a layer of indirection between intent and state.

Decision

Two repositories per site: site-config (per-site overrides) and fleet-config (organization-wide common config). Configuration changes happen exclusively as commits to these repos.

The router clones both, reconciles every 5 minutes:

  1. git fetch both repos.
  2. Detect drift (SHA256 of live files vs applied config).
  3. Apply layered: fleet-config first, site-config overrides.
  4. If a change fails to apply, rollback to last known good and alert.

Drift detection: anything edited live (e.g. vi /etc/config/firewall outside the repo) is flagged in the UI as DRIFTED until either committed back to the repo or reverted.

The GitOps panel in the console exposes Applied/HEAD/Remote SHAs and lets the operator trigger fetch/reconcile/rollback. For deeper changes, the operator pushes commits via Gitea/GitHub UI or git CLI.

Consequences

Good:

  • Anyone with read access to the repo can audit "what's running here".
  • Rolling back a regression is a git revert + reconcile.
  • Same operator experience scales from 1 site to 1000 sites.
  • The fleet-config repo enables "edit once, apply to all" for org-wide policy changes.

Bad / trade-offs:

  • Steeper learning curve for ops who don't know git. Mitigated by the GitOps panel handling common operations.
  • 5-minute reconcile lag means urgent changes feel slow. Mitigated by the manual reconcile button.
  • Secrets in repos are a problem — addressed by encrypting them with sops and decrypting at apply time.

Alternatives considered

  • Ansible push: requires central control node, secrets management, doesn't give the audit trail in the same place as the code.
  • Salt or Puppet: heavier than needed for our scale.
  • Direct UCI edits via API: works for one-off changes but produces no history and no rollback.