gigafibre-fsm/docs/architecture/overview.md
louispaulb 2ec5e49a06 docs: fix DocuSeal hostname (sign.gigafibre.ca, not docs.gigafibre.ca)
Three places in last week's docs refresh got the DocuSeal URL wrong —
must have been a copy-paste glitch since other parts of the same docs
(roadmap.md, module-interactions.md, archive snapshots) had it right.
Verified against the deployed Traefik labels:

  traefik.http.routers.docuseal.rule = Host(`sign.gigafibre.ca`)

`docs.gigafibre.ca` doesn't even resolve in DNS. Fixed in:
  • README.md (services table)
  • docs/README.md (services tree)
  • docs/architecture/overview.md (infra ASCII diagram)
2026-05-08 09:25:29 -04:00

8.9 KiB

Gigafibre FSM — Ecosystem Architecture

Unified reference document for infrastructure, platform strategy, and application architecture on the remote Docker environment.

1. Executive Summary & Platform Strategy

Gigafibre FSM is the operations platform for Gigafibre. It replaces a legacy PHP/MariaDB stack with a real-time push ecosystem (Vue 3, Node.js, ERPNext) running on a single Proxmox VM at 96.125.196.67.

Core pillars:

  • ERPNext v16 — undisputed Source of Truth (CRM, billing, ticketing).
  • Ops SPA at erp.gigafibre.ca/ops/ — single pane of glass for internal teams (dispatch, clients, settings, agent flows).
  • targo-hub at msg.gigafibre.ca — real-time API gateway (SMS, SSE, AI, OAuth admin, Stripe webhooks, Traccar proxy).
  • Client portal at client.gigafibre.ca — customer self-service.

Decommissioned (May 2026):

  • Oktopus CE (TR-369 stack at oss.gigafibre.ca) — broker spammed 75 GB of debug logs over 13 days, took ERPNext down for 4. Stack removed (containers + volumes + images). The hub gates the integration behind OKTOPUS_DISABLED=1 so the modules can be re-enabled later if we deploy a different USP controller.
  • dispatch-app (legacy PHP SPA at dispatch.gigafibre.ca) — now 301-redirects to /ops/#/dispatch. nginx config at /opt/dispatch-app/nginx.conf on the prod box.
  • apps/field — replaced by the lightweight mobile tech page at /t/{token} (server-rendered by services/targo-hub/lib/tech-mobile.js).

Two Authentik instances, in parallel — not a migration:

  • auth.targo.ca (staff) — protects /ops/, n8n, Gitea; OAuth provider for ERPNext sign-in.
  • id.gigafibre.ca (clients) — protects the customer portal.

2. Infrastructure & Docker Networks

All services are containerized and housed on a single Proxmox VM (96.125.196.67), managed via Traefik.

Internet
  │
96.125.196.67 (Proxmox VM, Ubuntu 24.04)
  │
  ├─ Traefik v2.11 (:80/:443, Let's Encrypt, ForwardAuth)
  │
  ├─ Authentik (auth.targo.ca)        → SSO for staff (ops, n8n, Gitea, ERPNext OAuth)
  ├─ Authentik (id.gigafibre.ca)      → SSO for client portal
  │
  ├─ ERPNext v16.10.1 (erp.gigafibre.ca) → 9 containers (db, redis, backend, queues, scheduler, websocket, n8n, n8n-proxy)
  │
  ├─ Ops SPA (erp.gigafibre.ca/ops/)  → Served via nginx:alpine from /opt/ops-app/
  ├─ Dispatch redirect (dispatch.gigafibre.ca) → 301 → /ops/#/dispatch (former dispatch-app, decommissioned)
  │
  ├─ targo-hub (msg.gigafibre.ca)     → Node 20, /opt/targo-hub/
  ├─ DocuSeal (sign.gigafibre.ca)     → Contract e-signature
  ├─ traccar-proxy                    → nginx relay for Traccar UI
  │
  └─ Marketing site (www.gigafibre.ca) → React/Vite/Tailwind

DNS Configuration (Cloudflare):

  • Domain gigafibre.ca is strictly DNS-only (no Cloudflare proxy) to allow Traefik Let's Encrypt generation.
  • Email via Mailjet + Google Workspace records configured on root.

Docker Networks:

  • proxy: Public-facing network connected to Traefik.
  • erpnext_erpnext: Internal network for Frappe, Postgres, Redis, and targo-hub routing.

3. Core Services

ERPNext (The Backend)

  • Database: PostgreSQL (erpnext-db-1).
  • Extensions: Custom doctypes for Dispatch Job, Technician, Tag, Service Location, Service Equipment, Subscription.
  • API Token Auth: targo-hub and the Ops PWA interact with Frappe via a highly-privileged service token (Authorization: token ...).

Targo-Hub (API Gateway)

  • Stack: Node.js 20 (msg.gigafibre.ca:3300).
  • Purpose: Acts as the middleman for all heavy or real-time workflows out of ERPNext's scope.
  • Key Abilities:
    • Real-time Server-Sent Events (SSE) for timeline/chat updates.
    • Twilio SMS / Voice (IVR) routing.
    • Modem polling (GenieACS, OLT SNMP proxy).
    • Webhooks handling (Stripe payments, Uptime-Kuma, 3CX).

Modem-Bridge

  • Stack: Playwright/Chromium (:3301 internal).
  • Purpose: Allows reading encrypted TR-181 parameters from TP-Link XX230v modems by leveraging the modem's native JS cryptography. Exposes a simple JSON REST API locally to targo-hub.

Vision / OCR (Gemini via targo-hub)

  • Model: Gemini 2.5 Flash (Google) — no local GPU, all inference remote.
  • Endpoints (hub): /vision/barcodes, /vision/equipment, /vision/invoice.
  • Why centralized: ops VM has no GPU, so the legacy Ollama llama3.2-vision install was retired. All three frontends (ops, field-as-ops /j, future client portal) hit the hub, which enforces JSON responseSchema per endpoint.
  • Client-side resilience: barcode scans use an 8s timeout + IndexedDB retry queue so techs in weak-LTE zones don't lose data. See ../features/vision-ocr.md for the full pipeline.

4. Security & Authentication Flow

Staff user → erp.gigafibre.ca/ops/  (or n8n, Gitea)
  → Traefik checks session via ForwardAuth middleware
  → Outpost validates with Authentik staff (auth.targo.ca)
  → Authorized? Request forwarded to upstream container
    with X-Authentik-Email + X-Authentik-Groups headers
  → Ops SPA reads X-Authentik-Email; useUserGroups maps groups
    to in-app capabilities

Customer user → client.gigafibre.ca
  → Traefik checks session via separate ForwardAuth chain
  → Outpost validates with Authentik client (id.gigafibre.ca)

Two distinct ForwardAuth middlewares:

  • authentik@file → backed by auth.targo.ca (staff)
  • authentik-client@file → backed by id.gigafibre.ca (customers)

ERPNext OAuthauth.targo.ca is also configured as a Frappe Social Login Key (provider name Authentik). The login page at /login shows both the password form and the "Login with Authentik" button. OAuth client_id P0rFFdq2hhun7hOLwkF5zm87vvDqcVYAhLtoZnFX, redirect_uri /api/method/frappe.integrations.oauth2_logins.custom/authentik.

Adding new users is centralized through the hub, not the Authentik admin UI. The ops Settings page (Settings → Utilisateurs → Inviter) hits POST /auth/users on msg.gigafibre.ca which:

  1. Creates the Authentik user (random username from local-part of email, password set explicitly), assigns OPS_GROUPS.
  2. Sets a temp password (readable, no look-alikes) and emails it via the hub's Mailjet SMTP — Authentik's own recovery flow isn't wired (flow_recovery=None on the brand) and its global SMTP is unset, so the hub does it directly.
  3. Creates the matching ERPNext User (System User, social_logins = [{provider:authentik, userid:email}]) so OAuth finds it on first login.

The temp password is also returned to the admin (UI shows it with a copy button) so they can hand it over manually if Mailjet drops the message. See services/targo-hub/lib/auth.js for the full flow.

API Security: frontends rely on the Authentik session cookie forwarded by Traefik. Backend scripts and the hub use Authorization: token <ERP_SERVICE_TOKEN> Bearer headers.


5. Network Intelligence & CPE Flow

Device Diagnostics (targo-hub → GenieACS / OLT) When a CSR clicks "Diagnostiquer" in the Ops app:

  1. Ops app asks /devices/lookup?serial=X.
  2. targo-hub polls GenieACS NBI.
  3. If deep data is needed, targo-hub queries modem-bridge (for TP-Link) or the OLT SNMP directly.
  4. Returns consolidated interface, mesh, wifi, and opticalStatus array to the UI.

Future: QR Code Flow

  • Tech applies QR sticker to modem (msg.gigafibre.ca/q/{mac}).
  • Client scans QR → targo-hub identifies customer via MAC matching in ERPNext.
  • Triggers SMS OTP → Client views diagnostic portal.

6. Development Gotchas

  1. Traefik v3 is incompatible with Docker 29 due to API changes. Stay on v2.11.
  2. Never click "Generate Keys" for the Administrator user in ERPNext — it breaks the targo-hub API token (silently).
  3. Traccar API supports only one deviceId per request. Use parallel polling (Promise.allSettled) — see services/targo-hub/lib/traccar.js.
  4. Docker log rotation is set globally via /etc/docker/daemon.json (max-size=100m, max-file=3). Applied at container creation — old containers keep their previous (uncapped) policy until you compose up -d --force-recreate them. We learned this the hard way when the Oktopus broker filled /var/sdb with 75 GB of debug logs in 13 days.
  5. Weekly prune runs via /etc/cron.d/docker-prune Sunday 03:00 ET — clears anything not used in 30 days. Don't add a stack you only run monthly without restart: always or it'll get pruned out.
  6. PostgreSQL transaction-aborted errors in the backend log — usually benign (one bad query in the Frappe scheduler) but if persistent, it's the connection pool needing a recycle. docker restart erpnext-backend-1 resolves.
  7. Authentik recovery flow isn't configured on the brand. Don't use recovery_email/ from the API — use the hub invite flow described in §4 instead.