docs(deployments): multi-site fleet pattern
This commit is contained in:
@@ -0,0 +1,112 @@
|
||||
# Multi-site fleet deployment
|
||||
|
||||
Pattern for customers with 5+ sites managed centrally.
|
||||
|
||||
## Topology
|
||||
|
||||
```
|
||||
┌──────────────────────────────┐
|
||||
│ Blocao Hub (Hetzner DE/FI) │
|
||||
│ - mosquitto │
|
||||
│ - keycloak │
|
||||
│ - qdrant + timescaledb │
|
||||
│ - sites overview UI │
|
||||
└────────────┬─────────────────┘
|
||||
│ MQTT bridges over TLS
|
||||
│
|
||||
┌───────────────┬───────┴───────┬────────────────┐
|
||||
│ │ │ │
|
||||
┌────▼────┐ ┌────▼────┐ ┌────▼────┐ ┌────▼────┐
|
||||
│BL-LAB-1 │ │BL-LAB-2 │ │BL-WH-N │ │BL-WH-S │
|
||||
│ R+1 │ │ R+1 │ │ R+2 │ │ R+1 │
|
||||
└─────────┘ └─────────┘ └─────────┘ └─────────┘
|
||||
```
|
||||
|
||||
Each site is independent — operates without the hub if WAN goes down. The hub is a **coordination layer**, not a critical-path dependency.
|
||||
|
||||
## Two GitOps repos
|
||||
|
||||
When fleet management is in play, the hub provisions two repos per customer:
|
||||
|
||||
1. **`fleet-config`** (org-wide common settings):
|
||||
- Default firewall rules.
|
||||
- Default Frigate model versions.
|
||||
- Default retention policy.
|
||||
- DNS allowlist baseline.
|
||||
- Common operator role definitions.
|
||||
|
||||
2. **`site-config-<site_id>`** (per-site overrides):
|
||||
- Site identity (BL-...).
|
||||
- Camera definitions specific to this site.
|
||||
- Retention overrides if different from fleet default.
|
||||
- Network specifics.
|
||||
|
||||
The router clones both. Reconcile applies fleet first, then site overrides.
|
||||
|
||||
This separation means:
|
||||
|
||||
- "Update Frigate to v0.15 across the fleet" → one commit to fleet-config, propagates to all sites in the next reconcile.
|
||||
- "Add a camera to BL-LAB-2" → one commit to site-config-bl-lab-2, only affects that site.
|
||||
|
||||
## Operator workflow
|
||||
|
||||
A fleet operator (e.g., security ops at headquarters of a 30-store retailer) typically:
|
||||
|
||||
1. **Hub Sites Overview**: see all sites with health/alerts.
|
||||
2. **Drill down**: click a site → Tailscale tunnel opens to that site's console.
|
||||
3. **Investigation**: query forensics either at the site (single-site context) or at the hub (cross-site context).
|
||||
4. **Bulk policy**: edit fleet-config repo for org-wide changes.
|
||||
|
||||
## Cross-site forensic search
|
||||
|
||||
Implemented in Epic 6 (post-MVP).
|
||||
|
||||
The hub maintains a consolidated embeddings index. Each site publishes embeddings (with site_id) to its bridge. Hub merges them. Operator query at hub level fans out:
|
||||
|
||||
- Direct lookup in consolidated index for "find vehicle plate L-7234" or "find this face".
|
||||
- Re-ranking with site-specific context.
|
||||
- Deep dive into a site's full data via Tailscale tunnel.
|
||||
|
||||
Raw video stays at sites. Embeddings + metadata at hub. Sovereignty preserved.
|
||||
|
||||
## Sites Overview UI
|
||||
|
||||
Separate from the per-site router console. Implemented in `apps/hub` (future code repo).
|
||||
|
||||
Mockup not yet created — design TBD post-MVP. Likely:
|
||||
|
||||
- Map of sites with status pins.
|
||||
- Aggregated health panel (% of sites green/warn/err).
|
||||
- Aggregated alerts panel (active across the fleet).
|
||||
- Bulk actions (update fleet-config, push command to N sites).
|
||||
|
||||
## Pricing model considerations
|
||||
|
||||
A 30-site customer should pay more than a 1-site customer. Subscription tiers:
|
||||
|
||||
| Tier | Sites | Monthly per site |
|
||||
|---|---|---|
|
||||
| Starter | 1-5 | €30-50 |
|
||||
| Standard | 6-25 | €25-40 |
|
||||
| Fleet | 26-100 | €20-30 |
|
||||
| Enterprise | 100+ | Custom |
|
||||
|
||||
Hardware sold separately. Support tiers add a flat monthly.
|
||||
|
||||
(All numbers placeholder; finalize with sales lead.)
|
||||
|
||||
## Operational considerations
|
||||
|
||||
- **Hub HA**: production hub should be at least 2 nodes (active-passive at minimum). For >50 sites, active-active with shared MinIO.
|
||||
- **Hub backup**: daily snapshots to a second region (OVH France as standard secondary).
|
||||
- **Site offline handling**: alerts after 5 min of bridge silence. Auto-resolve on reconnect.
|
||||
- **Cert management**: each site's mTLS cert renews automatically every 6 months. Monitoring alerts at 90 days.
|
||||
|
||||
## Customer journey
|
||||
|
||||
1. **Pilot**: 1-2 sites, hub provisioned, validate fleet workflow.
|
||||
2. **Rollout**: phased install of remaining sites.
|
||||
3. **Optimization**: 3 months in, review which fleet-config defaults to tighten.
|
||||
4. **Steady state**: ongoing ops, occasional new sites, regular fleet-config updates.
|
||||
|
||||
Total time from pilot to 30 sites in production: typically 6-9 months for a customer with established cabling/infra at each site.
|
||||
Reference in New Issue
Block a user