113 lines
4.9 KiB
Markdown
113 lines
4.9 KiB
Markdown
# Multi-site fleet deployment
|
|
|
|
Pattern for customers with 5+ sites managed centrally.
|
|
|
|
## Topology
|
|
|
|
```
|
|
┌──────────────────────────────┐
|
|
│ Blocao Hub (Hetzner DE/FI) │
|
|
│ - mosquitto │
|
|
│ - keycloak │
|
|
│ - qdrant + timescaledb │
|
|
│ - sites overview UI │
|
|
└────────────┬─────────────────┘
|
|
│ MQTT bridges over TLS
|
|
│
|
|
┌───────────────┬───────┴───────┬────────────────┐
|
|
│ │ │ │
|
|
┌────▼────┐ ┌────▼────┐ ┌────▼────┐ ┌────▼────┐
|
|
│BL-LAB-1 │ │BL-LAB-2 │ │BL-WH-N │ │BL-WH-S │
|
|
│ R+1 │ │ R+1 │ │ R+2 │ │ R+1 │
|
|
└─────────┘ └─────────┘ └─────────┘ └─────────┘
|
|
```
|
|
|
|
Each site is independent — operates without the hub if WAN goes down. The hub is a **coordination layer**, not a critical-path dependency.
|
|
|
|
## Two GitOps repos
|
|
|
|
When fleet management is in play, the hub provisions two repos per customer:
|
|
|
|
1. **`fleet-config`** (org-wide common settings):
|
|
- Default firewall rules.
|
|
- Default Frigate model versions.
|
|
- Default retention policy.
|
|
- DNS allowlist baseline.
|
|
- Common operator role definitions.
|
|
|
|
2. **`site-config-<site_id>`** (per-site overrides):
|
|
- Site identity (BL-...).
|
|
- Camera definitions specific to this site.
|
|
- Retention overrides if different from fleet default.
|
|
- Network specifics.
|
|
|
|
The router clones both. Reconcile applies fleet first, then site overrides.
|
|
|
|
This separation means:
|
|
|
|
- "Update Frigate to v0.15 across the fleet" → one commit to fleet-config, propagates to all sites in the next reconcile.
|
|
- "Add a camera to BL-LAB-2" → one commit to site-config-bl-lab-2, only affects that site.
|
|
|
|
## Operator workflow
|
|
|
|
A fleet operator (e.g., security ops at headquarters of a 30-store retailer) typically:
|
|
|
|
1. **Hub Sites Overview**: see all sites with health/alerts.
|
|
2. **Drill down**: click a site → Tailscale tunnel opens to that site's console.
|
|
3. **Investigation**: query forensics either at the site (single-site context) or at the hub (cross-site context).
|
|
4. **Bulk policy**: edit fleet-config repo for org-wide changes.
|
|
|
|
## Cross-site forensic search
|
|
|
|
Implemented in Epic 6 (post-MVP).
|
|
|
|
The hub maintains a consolidated embeddings index. Each site publishes embeddings (with site_id) to its bridge. Hub merges them. Operator query at hub level fans out:
|
|
|
|
- Direct lookup in consolidated index for "find vehicle plate L-7234" or "find this face".
|
|
- Re-ranking with site-specific context.
|
|
- Deep dive into a site's full data via Tailscale tunnel.
|
|
|
|
Raw video stays at sites. Embeddings + metadata at hub. Sovereignty preserved.
|
|
|
|
## Sites Overview UI
|
|
|
|
Separate from the per-site router console. Implemented in `apps/hub` (future code repo).
|
|
|
|
Mockup not yet created — design TBD post-MVP. Likely:
|
|
|
|
- Map of sites with status pins.
|
|
- Aggregated health panel (% of sites green/warn/err).
|
|
- Aggregated alerts panel (active across the fleet).
|
|
- Bulk actions (update fleet-config, push command to N sites).
|
|
|
|
## Pricing model considerations
|
|
|
|
A 30-site customer should pay more than a 1-site customer. Subscription tiers:
|
|
|
|
| Tier | Sites | Monthly per site |
|
|
|---|---|---|
|
|
| Starter | 1-5 | €30-50 |
|
|
| Standard | 6-25 | €25-40 |
|
|
| Fleet | 26-100 | €20-30 |
|
|
| Enterprise | 100+ | Custom |
|
|
|
|
Hardware sold separately. Support tiers add a flat monthly.
|
|
|
|
(All numbers placeholder; finalize with sales lead.)
|
|
|
|
## Operational considerations
|
|
|
|
- **Hub HA**: production hub should be at least 2 nodes (active-passive at minimum). For >50 sites, active-active with shared MinIO.
|
|
- **Hub backup**: daily snapshots to a second region (OVH France as standard secondary).
|
|
- **Site offline handling**: alerts after 5 min of bridge silence. Auto-resolve on reconnect.
|
|
- **Cert management**: each site's mTLS cert renews automatically every 6 months. Monitoring alerts at 90 days.
|
|
|
|
## Customer journey
|
|
|
|
1. **Pilot**: 1-2 sites, hub provisioned, validate fleet workflow.
|
|
2. **Rollout**: phased install of remaining sites.
|
|
3. **Optimization**: 3 months in, review which fleet-config defaults to tighten.
|
|
4. **Steady state**: ongoing ops, occasional new sites, regular fleet-config updates.
|
|
|
|
Total time from pilot to 30 sites in production: typically 6-9 months for a customer with established cabling/infra at each site.
|