docs(decisions): ADR 0004 — GitOps as source of truth
This commit is contained in:
@@ -0,0 +1,49 @@
|
||||
# ADR-0004 · GitOps as source of truth
|
||||
|
||||
**Status**: accepted
|
||||
**Date**: 2026-05
|
||||
|
||||
## Context
|
||||
|
||||
Configuration of a Blocao site involves: OpenWrt UCI files, Frigate config, MQTT bridge policy, firewall rules, container compose files, retention policy, custom DNS lists. This needs to be:
|
||||
|
||||
- Auditable (who changed what, when, why).
|
||||
- Reproducible (a new router should reach the same state as an existing one).
|
||||
- Transactional (changes apply together or not at all).
|
||||
- Reversible (rollback to last known good).
|
||||
|
||||
LuCI's web-based config edits and ad-hoc SSH changes don't satisfy any of those. Configuration management tools (Ansible, Puppet) work but introduce a layer of indirection between intent and state.
|
||||
|
||||
## Decision
|
||||
|
||||
**Two repositories per site**: `site-config` (per-site overrides) and `fleet-config` (organization-wide common config). Configuration changes happen exclusively as commits to these repos.
|
||||
|
||||
The router clones both, reconciles every 5 minutes:
|
||||
|
||||
1. `git fetch` both repos.
|
||||
2. Detect drift (SHA256 of live files vs applied config).
|
||||
3. Apply layered: `fleet-config` first, `site-config` overrides.
|
||||
4. If a change fails to apply, rollback to last known good and alert.
|
||||
|
||||
**Drift detection**: anything edited live (e.g. `vi /etc/config/firewall` outside the repo) is flagged in the UI as `DRIFTED` until either committed back to the repo or reverted.
|
||||
|
||||
The **GitOps panel** in the console exposes Applied/HEAD/Remote SHAs and lets the operator trigger fetch/reconcile/rollback. For deeper changes, the operator pushes commits via Gitea/GitHub UI or git CLI.
|
||||
|
||||
## Consequences
|
||||
|
||||
**Good**:
|
||||
- Anyone with read access to the repo can audit "what's running here".
|
||||
- Rolling back a regression is a `git revert + reconcile`.
|
||||
- Same operator experience scales from 1 site to 1000 sites.
|
||||
- The fleet-config repo enables "edit once, apply to all" for org-wide policy changes.
|
||||
|
||||
**Bad / trade-offs**:
|
||||
- Steeper learning curve for ops who don't know git. Mitigated by the GitOps panel handling common operations.
|
||||
- 5-minute reconcile lag means urgent changes feel slow. Mitigated by the **manual reconcile button**.
|
||||
- Secrets in repos are a problem — addressed by encrypting them with `sops` and decrypting at apply time.
|
||||
|
||||
## Alternatives considered
|
||||
|
||||
- **Ansible push**: requires central control node, secrets management, doesn't give the audit trail in the same place as the code.
|
||||
- **Salt or Puppet**: heavier than needed for our scale.
|
||||
- **Direct UCI edits via API**: works for one-off changes but produces no history and no rollback.
|
||||
Reference in New Issue
Block a user