gigafibre-fsm/docs/ARCHITECTURE.md
louispaulb e50ea88c08 feat: unify vision on Gemini + port field tech scan/device into /j
- Invoice OCR migrated from Ollama (GPU-bound, local) to Gemini 2.5
  Flash via new targo-hub /vision/invoice endpoint with responseSchema
  enforcement. Ops VM no longer needs a GPU.
- Ops /j/* now has full camera scanner (TechScanPage) ported from
  apps/field with 8s timeout + offline queue + auto-link to Dispatch
  Job context on serial/barcode/MAC 3-tier lookup.
- New TechDevicePage reached via /j/device/:serial showing every
  ERPNext entity related to a scanned device: Service Equipment,
  Customer, Service Location, active Subscription, open Issues,
  upcoming Dispatch Jobs, OLT info.
- New docs/VISION_AND_OCR.md (full pipeline + §10 relationship graph
  + §8.1 secrets/rotation policy). Cross-linked from ARCHITECTURE,
  ROADMAP, HANDOFF, README.
- Nginx /ollama/ proxy blocks removed from both ops + field.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-22 11:26:01 -04:00

118 lines
5.4 KiB
Markdown

# Gigafibre FSM — Ecosystem Architecture
> Unified reference document for infrastructure, platform strategy, and application architecture on the remote Docker environment.
## 1. Executive Summary & Platform Strategy
Gigafibre FSM represents the complete operations platform for Gigafibre, shifting from a polling-based legacy PHP system to a modern, real-time push ecosystem (Vue 3, Node.js, ERPNext, TR-369).
The strategy pivots around a **unified core platform** running entirely on a remote Proxmox VM (96.125.196.67):
- **ERPNext v16** as the undisputed Source of Truth (CRM, billing, ticketing).
- **Targo Ops PWA** as the single pane of glass for internal teams.
- **Targo Hub** as the real-time API gateway (SMS, SSE, AI, TR-069 proxy).
- **Client.gigafibre.ca** for customer self-service.
**Legacy Retirement Plan (April-May 2026):**
- *Retire* `dispatch-app` — Functionality now in Ops + lightweight mobile tech page (`/t/{token}`).
- *Retire* `apps/field` — Redundant to the mobile tech page workflow.
- *Retire* `auth.targo.ca` — Fully migrated to `id.gigafibre.ca` Authentik.
---
## 2. Infrastructure & Docker Networks
All services are containerized and housed on a single Proxmox VM (`96.125.196.67`), managed via Traefik.
```text
Internet
96.125.196.67 (Proxmox VM, Ubuntu 24.04)
├─ Traefik v2.11 (:80/:443, Let's Encrypt, ForwardAuth)
├─ Authentik SSO (id.gigafibre.ca) → Secures /ops/ and client portal
├─ ERPNext v16.10.1 (erp.gigafibre.ca) → 9 containers (db, redis, workers)
├─ Targo Ops App (erp.gigafibre.ca/ops/) → Served via nginx:alpine
├─ n8n (n8n.gigafibre.ca) → Auto-login proxy wired to Authentik headers
├─ Oktopus CE (oss.gigafibre.ca) → TR-369 CPE management
└─ WWW / Frontend (www.gigafibre.ca) → React marketing site
```
**DNS Configuration (Cloudflare):**
- Domain `gigafibre.ca` is strictly DNS-only (no Cloudflare proxy) to allow Traefik Let's Encrypt generation.
- Email via Mailjet + Google Workspace records configured on root.
**Docker Networks:**
- `proxy`: Public-facing network connected to Traefik.
- `erpnext_erpnext`: Internal network for Frappe, Postgres, Redis, and targo-hub routing.
---
## 3. Core Services
### ERPNext (The Backend)
- **Database:** PostgreSQL (`erpnext-db-1`).
- **Extensions:** Custom doctypes for Dispatch Job, Technician, Tag, Service Location, Service Equipment, Subscription.
- **API Token Auth:** `targo-hub` and the Ops PWA interact with Frappe via a highly-privileged service token (`Authorization: token ...`).
### Targo-Hub (API Gateway)
- **Stack:** Node.js 20 (`msg.gigafibre.ca:3300`).
- **Purpose:** Acts as the middleman for all heavy or real-time workflows out of ERPNext's scope.
- **Key Abilities:**
- Real-time Server-Sent Events (SSE) for timeline/chat updates.
- Twilio SMS / Voice (IVR) routing.
- Modem polling (GenieACS, OLT SNMP proxy).
- Webhooks handling (Stripe payments, Uptime-Kuma, 3CX).
### Modem-Bridge
- **Stack:** Playwright/Chromium (`:3301` internal).
- **Purpose:** Allows reading encrypted TR-181 parameters from TP-Link XX230v modems by leveraging the modem's native JS cryptography. Exposes a simple JSON REST API locally to targo-hub.
### Vision / OCR (Gemini via targo-hub)
- **Model:** Gemini 2.5 Flash (Google) — no local GPU, all inference remote.
- **Endpoints (hub):** `/vision/barcodes`, `/vision/equipment`, `/vision/invoice`.
- **Why centralized:** ops VM has no GPU, so the legacy Ollama `llama3.2-vision` install was retired. All three frontends (ops, field-as-ops `/j`, future client portal) hit the hub, which enforces JSON `responseSchema` per endpoint.
- **Client-side resilience:** barcode scans use an 8s timeout + IndexedDB retry queue so techs in weak-LTE zones don't lose data. See [VISION_AND_OCR.md](VISION_AND_OCR.md) for the full pipeline.
---
## 4. Security & Authentication Flow
```text
User → app.gigafibre.ca
→ Traefik checks session via ForwardAuth middleware
→ Flow routed to Authentik (id.gigafibre.ca)
→ Authorized? Request forwarded to native container with 'X-Authentik-Email' header
```
- **ForwardAuth (`authentik-client@file`):** Currently protects `erp.gigafibre.ca/ops/`, `n8n`, and `hub`.
- **API Security:** Frontends use the Authentik session proxy; Backend services/scripts use the `Authorization: token` headers directly hitting Frappe's `/api/method`.
---
## 5. Network Intelligence & CPE Flow
**Device Diagnostics (`targo-hub → GenieACS / OLT`)**
When a CSR clicks "Diagnostiquer" in the Ops app:
1. Ops app asks `/devices/lookup?serial=X`.
2. `targo-hub` polls GenieACS NBI.
3. If deep data is needed, `targo-hub` queries `modem-bridge` (for TP-Link) or the OLT SNMP directly.
4. Returns consolidated interface, mesh, wifi, and opticalStatus array to the UI.
**Future: QR Code Flow**
- Tech applies QR sticker to modem (`msg.gigafibre.ca/q/{mac}`).
- Client scans QR → `targo-hub` identifies customer via MAC matching in ERPNext.
- Triggers SMS OTP → Client views diagnostic portal.
---
## 6. Development Gotchas
1. **Traefik v3** is incompatible with Docker 29 due to API changes. Stay on v2.11.
2. **MongoDB 5+** (Oktopus) requires AVX extensions. Proxmox CPU must be set to `host`.
3. Never click "Generate Keys" for the Administrator user in ERPNext or it breaks the `targo-hub` API token.
4. **Traccar API** supports only one `deviceId` per request. Use parallel polling.