4.9 KiB
Multi-site fleet deployment
Pattern for customers with 5+ sites managed centrally.
Topology
┌──────────────────────────────┐
│ Blocao Hub (Hetzner DE/FI) │
│ - mosquitto │
│ - keycloak │
│ - qdrant + timescaledb │
│ - sites overview UI │
└────────────┬─────────────────┘
│ MQTT bridges over TLS
│
┌───────────────┬───────┴───────┬────────────────┐
│ │ │ │
┌────▼────┐ ┌────▼────┐ ┌────▼────┐ ┌────▼────┐
│BL-LAB-1 │ │BL-LAB-2 │ │BL-WH-N │ │BL-WH-S │
│ R+1 │ │ R+1 │ │ R+2 │ │ R+1 │
└─────────┘ └─────────┘ └─────────┘ └─────────┘
Each site is independent — operates without the hub if WAN goes down. The hub is a coordination layer, not a critical-path dependency.
Two GitOps repos
When fleet management is in play, the hub provisions two repos per customer:
-
fleet-config(org-wide common settings):- Default firewall rules.
- Default Frigate model versions.
- Default retention policy.
- DNS allowlist baseline.
- Common operator role definitions.
-
site-config-<site_id>(per-site overrides):- Site identity (BL-...).
- Camera definitions specific to this site.
- Retention overrides if different from fleet default.
- Network specifics.
The router clones both. Reconcile applies fleet first, then site overrides.
This separation means:
- "Update Frigate to v0.15 across the fleet" → one commit to fleet-config, propagates to all sites in the next reconcile.
- "Add a camera to BL-LAB-2" → one commit to site-config-bl-lab-2, only affects that site.
Operator workflow
A fleet operator (e.g., security ops at headquarters of a 30-store retailer) typically:
- Hub Sites Overview: see all sites with health/alerts.
- Drill down: click a site → Tailscale tunnel opens to that site's console.
- Investigation: query forensics either at the site (single-site context) or at the hub (cross-site context).
- Bulk policy: edit fleet-config repo for org-wide changes.
Cross-site forensic search
Implemented in Epic 6 (post-MVP).
The hub maintains a consolidated embeddings index. Each site publishes embeddings (with site_id) to its bridge. Hub merges them. Operator query at hub level fans out:
- Direct lookup in consolidated index for "find vehicle plate L-7234" or "find this face".
- Re-ranking with site-specific context.
- Deep dive into a site's full data via Tailscale tunnel.
Raw video stays at sites. Embeddings + metadata at hub. Sovereignty preserved.
Sites Overview UI
Separate from the per-site router console. Implemented in apps/hub (future code repo).
Mockup not yet created — design TBD post-MVP. Likely:
- Map of sites with status pins.
- Aggregated health panel (% of sites green/warn/err).
- Aggregated alerts panel (active across the fleet).
- Bulk actions (update fleet-config, push command to N sites).
Pricing model considerations
A 30-site customer should pay more than a 1-site customer. Subscription tiers:
| Tier | Sites | Monthly per site |
|---|---|---|
| Starter | 1-5 | €30-50 |
| Standard | 6-25 | €25-40 |
| Fleet | 26-100 | €20-30 |
| Enterprise | 100+ | Custom |
Hardware sold separately. Support tiers add a flat monthly.
(All numbers placeholder; finalize with sales lead.)
Operational considerations
- Hub HA: production hub should be at least 2 nodes (active-passive at minimum). For >50 sites, active-active with shared MinIO.
- Hub backup: daily snapshots to a second region (OVH France as standard secondary).
- Site offline handling: alerts after 5 min of bridge silence. Auto-resolve on reconnect.
- Cert management: each site's mTLS cert renews automatically every 6 months. Monitoring alerts at 90 days.
Customer journey
- Pilot: 1-2 sites, hub provisioned, validate fleet workflow.
- Rollout: phased install of remaining sites.
- Optimization: 3 months in, review which fleet-config defaults to tighten.
- Steady state: ongoing ops, occasional new sites, regular fleet-config updates.
Total time from pilot to 30 sites in production: typically 6-9 months for a customer with established cabling/infra at each site.