# Multi-site fleet deployment

Pattern for customers with 5+ sites managed centrally.

## Topology

```
                   ┌──────────────────────────────┐
                   │  Blocao Hub (Hetzner DE/FI)  │
                   │  - mosquitto                 │
                   │  - keycloak                  │
                   │  - qdrant + timescaledb      │
                   │  - sites overview UI         │
                   └────────────┬─────────────────┘
                                │ MQTT bridges over TLS
                                │
        ┌───────────────┬───────┴───────┬────────────────┐
        │               │               │                │
   ┌────▼────┐     ┌────▼────┐     ┌────▼────┐     ┌────▼────┐
   │BL-LAB-1 │     │BL-LAB-2 │     │BL-WH-N  │     │BL-WH-S  │
   │ R+1     │     │ R+1     │     │ R+2     │     │ R+1     │
   └─────────┘     └─────────┘     └─────────┘     └─────────┘
```

Each site is independent — operates without the hub if WAN goes down. The hub is a **coordination layer**, not a critical-path dependency.

## Two GitOps repos

When fleet management is in play, the hub provisions two repos per customer:

1. **`fleet-config`** (org-wide common settings):
   - Default firewall rules.
   - Default Frigate model versions.
   - Default retention policy.
   - DNS allowlist baseline.
   - Common operator role definitions.

2. **`site-config-<site_id>`** (per-site overrides):
   - Site identity (BL-...).
   - Camera definitions specific to this site.
   - Retention overrides if different from fleet default.
   - Network specifics.

The router clones both. Reconcile applies fleet first, then site overrides.

This separation means:

- "Update Frigate to v0.15 across the fleet" → one commit to fleet-config, propagates to all sites in the next reconcile.
- "Add a camera to BL-LAB-2" → one commit to site-config-bl-lab-2, only affects that site.

## Operator workflow

A fleet operator (e.g., security ops at headquarters of a 30-store retailer) typically:

1. **Hub Sites Overview**: see all sites with health/alerts.
2. **Drill down**: click a site → Tailscale tunnel opens to that site's console.
3. **Investigation**: query forensics either at the site (single-site context) or at the hub (cross-site context).
4. **Bulk policy**: edit fleet-config repo for org-wide changes.

## Cross-site forensic search

Implemented in Epic 6 (post-MVP).

The hub maintains a consolidated embeddings index. Each site publishes embeddings (with site_id) to its bridge. Hub merges them. Operator query at hub level fans out:

- Direct lookup in consolidated index for "find vehicle plate L-7234" or "find this face".
- Re-ranking with site-specific context.
- Deep dive into a site's full data via Tailscale tunnel.

Raw video stays at sites. Embeddings + metadata at hub. Sovereignty preserved.

## Sites Overview UI

Separate from the per-site router console. Implemented in `apps/hub` (future code repo).

Mockup not yet created — design TBD post-MVP. Likely:

- Map of sites with status pins.
- Aggregated health panel (% of sites green/warn/err).
- Aggregated alerts panel (active across the fleet).
- Bulk actions (update fleet-config, push command to N sites).

## Pricing model considerations

A 30-site customer should pay more than a 1-site customer. Subscription tiers:

| Tier | Sites | Monthly per site |
|---|---|---|
| Starter | 1-5 | €30-50 |
| Standard | 6-25 | €25-40 |
| Fleet | 26-100 | €20-30 |
| Enterprise | 100+ | Custom |

Hardware sold separately. Support tiers add a flat monthly.

(All numbers placeholder; finalize with sales lead.)

## Operational considerations

- **Hub HA**: production hub should be at least 2 nodes (active-passive at minimum). For >50 sites, active-active with shared MinIO.
- **Hub backup**: daily snapshots to a second region (OVH France as standard secondary).
- **Site offline handling**: alerts after 5 min of bridge silence. Auto-resolve on reconnect.
- **Cert management**: each site's mTLS cert renews automatically every 6 months. Monitoring alerts at 90 days.

## Customer journey

1. **Pilot**: 1-2 sites, hub provisioned, validate fleet workflow.
2. **Rollout**: phased install of remaining sites.
3. **Optimization**: 3 months in, review which fleet-config defaults to tighten.
4. **Steady state**: ongoing ops, occasional new sites, regular fleet-config updates.

Total time from pilot to 30 sites in production: typically 6-9 months for a customer with established cabling/infra at each site.