Files

126 lines
6.9 KiB
Markdown

# Architecture · Overview
This is the high-level technical view. For specific subsystems see the sibling files in this folder.
## The four tiers
```
┌──────────────────────────────────────┐
│ Hub (EU sovereign bare-metal) │
│ - multi-site control plane │
│ - cross-site forensic search │
│ - operator auth (Keycloak) │
│ - long-term embeddings index │
└────────────────┬─────────────────────┘
│ MQTT bridge over TLS
│ + HTTPS for blob storage
╔══════════════╪══════════════╗
║ │ ║ per-site
║ ┌─────────▼─────────┐ ║ boundary
║ │ Router (OpenWrt) │ ║
║ │ - mosquitto │ ║
║ │ - tailscale │ ║
║ │ - GitOps recon. │ ║
║ │ - SPA host │ ║
║ │ - reverse proxy │ ║
║ └────┬─────────┬────┘ ║
║ │ │ ║
║ VLAN-10 VLAN-20 ║
║ cameras compute ║
║ ┌──┐ ┌──┐ ┌────────┐ ║
║ │c1│ │c2│ │ Cell │ ║
║ └──┘ └──┘ │RK3588 │ ║
║ ... │ │ ║
║ │frigate │ ║
║ │enricher│ ║
║ │re-id │ ║
║ │healthd │ ║
║ └────────┘ ║
║ ▲ ║
║ │ ║
║ ▼ ║
║ (optional Core) ║
║ ┌─────────────┐ ║
║ │ Jetson Orin │ ║
║ │ federates │ ║
║ │ N Cells │ ║
║ └─────────────┘ ║
╚══════════════════════════════╝
```
## Data flow
**Capture**: cameras (RTSP) and microphones publish to the Cell.
**Inference**: Frigate runs detection on streams, generates events. Enricher consumes events and produces embeddings, hashes, re-ID vectors.
**Bus**: everything flows over MQTT topics on the local broker. The contract is documented in [`mqtt-contract.md`](mqtt-contract.md).
**Storage**:
- Raw video clips → Cell's encrypted disk (NVMe hot + HDD cold). Never bridged.
- Embeddings + metadata → Cell's local index, optionally bridged to hub.
- Snapshots → local only.
- Aggregated health/state → bridged to hub.
**Console**: SPA hosted by the router. Talks to the router via `/api/router/*` (ubus) and to the Cell via `/api/cell/*` (reverse-proxied by the router).
**Configuration**: GitOps repos cloned by the router. Reconciled every 5 minutes. See [ADR-0004](../../decisions/0004-gitops-como-source-of-truth.md).
**Egress sovereignty**: only what the bridge policy explicitly allows leaves the site. See [`data-sovereignty.md`](data-sovereignty.md).
## Key technologies
| Layer | Choice |
|---|---|
| Router OS | OpenWrt 23.05+ |
| Router hardware | GL.iNet GL-MT6000 (default), Banana Pi BPi-R4 (alternative) |
| Cell OS | Balena OS on RK3588 |
| Cell hardware | Banana Pi BPI-W3, Radxa Rock 5B+ |
| Edge AI engine | Frigate v0.14+ with RKNN |
| Models | YOLOv8n (detection), CLIP (embeddings), reid models, Whisper-v3 (audio) |
| Broker | Mosquitto with bridge |
| Time sync | chrony with NTS |
| VPN | Tailscale |
| GitOps | Git + cron-driven reconcile (custom ucode script) |
| Hub OS | Debian on Hetzner bare-metal |
| Hub services | Mosquitto, MinIO, Qdrant, TimescaleDB, Keycloak, Caddy |
## Provisioning lifecycle
1. **Image build**: OpenWrt Image Builder produces a router firmware with `luci-app-blocao-console` and dependencies.
2. **First boot**: router shows the wizard at `http://blocao-router.local/`. Installer goes through 6 steps.
3. **Cell enrollment**: Cell devices on the local network are auto-discovered (mDNS + MQTT announce). Balena handles their provisioning over the air.
4. **GitOps repos**: created by the wizard at hub side. Site repo is initialized with applied UCI config.
5. **Hub registration**: router exchanges enrollment token for mTLS cert. Bridge starts. Sites Overview at hub now shows the new site.
6. **Camera onboarding**: scan VLAN-10, identify, authenticate, test, configure. GitOps commit + Frigate reload.
7. **Operator login**: console at the router via Tailscale or local network.
## Failure modes considered
| Failure | Mitigation |
|---|---|
| WAN down | Site continues operating; events queue locally; bridge resumes when WAN returns |
| Hub down | Router operates standalone; queue grows; reconnects automatically |
| Cell crash | Frigate auto-restarts via Balena; events buffered; selftest alerts in <30s |
| Camera offline | Detected by selftest; alert in SYNOPSIS and CAMS; doesn't block other cameras |
| Bridge cert expires | Selftest warns 90 days before; auto-renewal planned (currently manual) |
| GitOps push conflict | Reconcile fails, alerts via `_bridge/status`, last-known-good remains applied |
| Cell disk full | Retention rotates oldest first; soft alert at 75%, hard at 85%, evidence locker has separate quota |
| Operator forgets password | Router has local recovery via console port; hub has Keycloak admin recovery |
## Out of scope (for design phase)
- Mobile operator app — explicitly post-MVP.
- Real-time video streaming to operator over WAN — bandwidth-prohibitive, only on-demand clips.
- Automated camera positioning / PTZ control — vendor-specific, deferred.
- Cross-site real-time correlation (e.g., follow a vehicle across sites in real time) — analytics-grade feature, post-MVP.
## See also
- [`tiers.md`](tiers.md) — detailed roles per tier.
- [`mqtt-contract.md`](mqtt-contract.md) — MQTT topic schema.
- [`data-sovereignty.md`](data-sovereignty.md) — what stays local, what leaves.
- [`storage-retention.md`](storage-retention.md) — capacity planning.
- [`network-topology.md`](network-topology.md) — VLANs, firewall, segmentation.