Three places in last week's docs refresh got the DocuSeal URL wrong — must have been a copy-paste glitch since other parts of the same docs (roadmap.md, module-interactions.md, archive snapshots) had it right. Verified against the deployed Traefik labels: traefik.http.routers.docuseal.rule = Host(`sign.gigafibre.ca`) `docs.gigafibre.ca` doesn't even resolve in DNS. Fixed in: • README.md (services table) • docs/README.md (services tree) • docs/architecture/overview.md (infra ASCII diagram)
8.9 KiB
Gigafibre FSM — Ecosystem Architecture
Unified reference document for infrastructure, platform strategy, and application architecture on the remote Docker environment.
1. Executive Summary & Platform Strategy
Gigafibre FSM is the operations platform for Gigafibre. It replaces a
legacy PHP/MariaDB stack with a real-time push ecosystem (Vue 3,
Node.js, ERPNext) running on a single Proxmox VM at 96.125.196.67.
Core pillars:
- ERPNext v16 — undisputed Source of Truth (CRM, billing, ticketing).
- Ops SPA at
erp.gigafibre.ca/ops/— single pane of glass for internal teams (dispatch, clients, settings, agent flows). - targo-hub at
msg.gigafibre.ca— real-time API gateway (SMS, SSE, AI, OAuth admin, Stripe webhooks, Traccar proxy). - Client portal at
client.gigafibre.ca— customer self-service.
Decommissioned (May 2026):
- ✗
Oktopus CE(TR-369 stack atoss.gigafibre.ca) — broker spammed 75 GB of debug logs over 13 days, took ERPNext down for 4. Stack removed (containers + volumes + images). The hub gates the integration behindOKTOPUS_DISABLED=1so the modules can be re-enabled later if we deploy a different USP controller. - ✗
dispatch-app(legacy PHP SPA atdispatch.gigafibre.ca) — now 301-redirects to/ops/#/dispatch. nginx config at/opt/dispatch-app/nginx.confon the prod box. - ✗
apps/field— replaced by the lightweight mobile tech page at/t/{token}(server-rendered byservices/targo-hub/lib/tech-mobile.js).
Two Authentik instances, in parallel — not a migration:
auth.targo.ca(staff) — protects /ops/, n8n, Gitea; OAuth provider for ERPNext sign-in.id.gigafibre.ca(clients) — protects the customer portal.
2. Infrastructure & Docker Networks
All services are containerized and housed on a single Proxmox VM (96.125.196.67), managed via Traefik.
Internet
│
96.125.196.67 (Proxmox VM, Ubuntu 24.04)
│
├─ Traefik v2.11 (:80/:443, Let's Encrypt, ForwardAuth)
│
├─ Authentik (auth.targo.ca) → SSO for staff (ops, n8n, Gitea, ERPNext OAuth)
├─ Authentik (id.gigafibre.ca) → SSO for client portal
│
├─ ERPNext v16.10.1 (erp.gigafibre.ca) → 9 containers (db, redis, backend, queues, scheduler, websocket, n8n, n8n-proxy)
│
├─ Ops SPA (erp.gigafibre.ca/ops/) → Served via nginx:alpine from /opt/ops-app/
├─ Dispatch redirect (dispatch.gigafibre.ca) → 301 → /ops/#/dispatch (former dispatch-app, decommissioned)
│
├─ targo-hub (msg.gigafibre.ca) → Node 20, /opt/targo-hub/
├─ DocuSeal (sign.gigafibre.ca) → Contract e-signature
├─ traccar-proxy → nginx relay for Traccar UI
│
└─ Marketing site (www.gigafibre.ca) → React/Vite/Tailwind
DNS Configuration (Cloudflare):
- Domain
gigafibre.cais strictly DNS-only (no Cloudflare proxy) to allow Traefik Let's Encrypt generation. - Email via Mailjet + Google Workspace records configured on root.
Docker Networks:
proxy: Public-facing network connected to Traefik.erpnext_erpnext: Internal network for Frappe, Postgres, Redis, and targo-hub routing.
3. Core Services
ERPNext (The Backend)
- Database: PostgreSQL (
erpnext-db-1). - Extensions: Custom doctypes for Dispatch Job, Technician, Tag, Service Location, Service Equipment, Subscription.
- API Token Auth:
targo-huband the Ops PWA interact with Frappe via a highly-privileged service token (Authorization: token ...).
Targo-Hub (API Gateway)
- Stack: Node.js 20 (
msg.gigafibre.ca:3300). - Purpose: Acts as the middleman for all heavy or real-time workflows out of ERPNext's scope.
- Key Abilities:
- Real-time Server-Sent Events (SSE) for timeline/chat updates.
- Twilio SMS / Voice (IVR) routing.
- Modem polling (GenieACS, OLT SNMP proxy).
- Webhooks handling (Stripe payments, Uptime-Kuma, 3CX).
Modem-Bridge
- Stack: Playwright/Chromium (
:3301internal). - Purpose: Allows reading encrypted TR-181 parameters from TP-Link XX230v modems by leveraging the modem's native JS cryptography. Exposes a simple JSON REST API locally to targo-hub.
Vision / OCR (Gemini via targo-hub)
- Model: Gemini 2.5 Flash (Google) — no local GPU, all inference remote.
- Endpoints (hub):
/vision/barcodes,/vision/equipment,/vision/invoice. - Why centralized: ops VM has no GPU, so the legacy Ollama
llama3.2-visioninstall was retired. All three frontends (ops, field-as-ops/j, future client portal) hit the hub, which enforces JSONresponseSchemaper endpoint. - Client-side resilience: barcode scans use an 8s timeout + IndexedDB retry queue so techs in weak-LTE zones don't lose data. See ../features/vision-ocr.md for the full pipeline.
4. Security & Authentication Flow
Staff user → erp.gigafibre.ca/ops/ (or n8n, Gitea)
→ Traefik checks session via ForwardAuth middleware
→ Outpost validates with Authentik staff (auth.targo.ca)
→ Authorized? Request forwarded to upstream container
with X-Authentik-Email + X-Authentik-Groups headers
→ Ops SPA reads X-Authentik-Email; useUserGroups maps groups
to in-app capabilities
Customer user → client.gigafibre.ca
→ Traefik checks session via separate ForwardAuth chain
→ Outpost validates with Authentik client (id.gigafibre.ca)
Two distinct ForwardAuth middlewares:
authentik@file→ backed byauth.targo.ca(staff)authentik-client@file→ backed byid.gigafibre.ca(customers)
ERPNext OAuth — auth.targo.ca is also configured as a Frappe
Social Login Key (provider name Authentik). The login page at
/login shows both the password form and the "Login with Authentik"
button. OAuth client_id P0rFFdq2hhun7hOLwkF5zm87vvDqcVYAhLtoZnFX,
redirect_uri /api/method/frappe.integrations.oauth2_logins.custom/authentik.
Adding new users is centralized through the hub, not the Authentik
admin UI. The ops Settings page (Settings → Utilisateurs → Inviter)
hits POST /auth/users on msg.gigafibre.ca which:
- Creates the Authentik user (random username from local-part of email, password set explicitly), assigns OPS_GROUPS.
- Sets a temp password (readable, no look-alikes) and emails it via
the hub's Mailjet SMTP — Authentik's own recovery flow isn't wired
(
flow_recovery=Noneon the brand) and its global SMTP is unset, so the hub does it directly. - Creates the matching ERPNext User (System User, social_logins = [{provider:authentik, userid:email}]) so OAuth finds it on first login.
The temp password is also returned to the admin (UI shows it with a
copy button) so they can hand it over manually if Mailjet drops the
message. See services/targo-hub/lib/auth.js for the full flow.
API Security: frontends rely on the Authentik session cookie
forwarded by Traefik. Backend scripts and the hub use
Authorization: token <ERP_SERVICE_TOKEN> Bearer headers.
5. Network Intelligence & CPE Flow
Device Diagnostics (targo-hub → GenieACS / OLT)
When a CSR clicks "Diagnostiquer" in the Ops app:
- Ops app asks
/devices/lookup?serial=X. targo-hubpolls GenieACS NBI.- If deep data is needed,
targo-hubqueriesmodem-bridge(for TP-Link) or the OLT SNMP directly. - Returns consolidated interface, mesh, wifi, and opticalStatus array to the UI.
Future: QR Code Flow
- Tech applies QR sticker to modem (
msg.gigafibre.ca/q/{mac}). - Client scans QR →
targo-hubidentifies customer via MAC matching in ERPNext. - Triggers SMS OTP → Client views diagnostic portal.
6. Development Gotchas
- Traefik v3 is incompatible with Docker 29 due to API changes. Stay on v2.11.
- Never click "Generate Keys" for the Administrator user in ERPNext — it breaks the
targo-hubAPI token (silently). - Traccar API supports only one
deviceIdper request. Use parallel polling (Promise.allSettled) — seeservices/targo-hub/lib/traccar.js. - Docker log rotation is set globally via
/etc/docker/daemon.json(max-size=100m, max-file=3). Applied at container creation — old containers keep their previous (uncapped) policy until youcompose up -d --force-recreatethem. We learned this the hard way when the Oktopus broker filled/var/sdbwith 75 GB of debug logs in 13 days. - Weekly prune runs via
/etc/cron.d/docker-pruneSunday 03:00 ET — clears anything not used in 30 days. Don't add a stack you only run monthly withoutrestart: alwaysor it'll get pruned out. - PostgreSQL transaction-aborted errors in the backend log — usually benign (one bad query in the Frappe scheduler) but if persistent, it's the connection pool needing a recycle.
docker restart erpnext-backend-1resolves. - Authentik recovery flow isn't configured on the brand. Don't use
recovery_email/from the API — use the hub invite flow described in §4 instead.