Mass refresh — the docs were last touched 2026-04-22, two weeks behind
shipped reality. This commit updates 9 files to reflect current truth.
WHAT CHANGED IN THE PRODUCT (since 22 Apr) THAT THE DOCS NOW REFLECT:
• Oktopus CE / TR-369 stack decommissioned (containers + volumes +
images all removed; broker had filled /dev/sdb with 75 GB of debug
logs and took ERPNext down for 4 days). Hub gates the integration
behind OKTOPUS_DISABLED=1 — modules retained, no-op'd at runtime.
• dispatch.gigafibre.ca (legacy PHP SPA) replaced by an nginx 301
redirect to /ops/#/dispatch.
• Top toolbar of the dispatch module: collapsed to single-color
Lucide icons + ⋯ overflow menu + "Vue principale ▾" + "[👥 N ▾]"
resource type chip (defaults to techs, materials in the dropdown
only when relevant).
• Tech home base / departure point: editable per-tech via 📍 button,
address geocode (Nominatim) or click-on-map picker, right-click
on tech pin opens the same actions. Map defaults centered on
Gigafibre HQ (1867 chemin de la Rivière, Sainte-Clotilde) instead
of downtown Montreal.
• POST /auth/users invite flow on the hub: creates the Authentik
user, sets a temp password, mails it via Mailjet (Authentik's
own recovery flow isn't configured), creates the matching ERPNext
System User. Surfaced in ops Settings → Utilisateurs → Inviter.
• Two Authentik instances clarified as parallel-and-permanent (not
a migration): auth.targo.ca for staff, id.gigafibre.ca for clients.
FILES TOUCHED:
README.md — service table refreshed, arch diagram redrawn (no
Oktopus row), auth section explains the invite flow + two
parallel instances.
docs/architecture/overview.md — new "Decommissioned" section,
correct retirement status for dispatch-app + apps/field, two
Authentik instances explicitly distinguished, dev-gotchas list
rewritten (drops MongoDB AVX, adds log-rotation hard-learned
lesson, adds note about Authentik recovery flow).
docs/architecture/data-model.md — Step 5 hardware provisioning
now describes the GenieACS path (TR-069 Inform → preset push)
instead of the dead TR-369 path.
docs/architecture/module-interactions.md — oktopus.js and
oktopus-mqtt.js entries marked as gated, provision.js note
updated, GenieACS row in external-integrations updated, MQTT
row removed from real-time channels, interaction matrix loses
the Oktopus column and gains an Authentik admin REST cell.
docs/features/dispatch.md — Top bar section completely rewritten
to match the current chrome (left/center/right regions,
single-color Lucide, dropdowns); new Tech home base section
documenting the 📍 + map-pick + right-click flows; retirement
note now reads as a status, not a plan.
docs/features/cpe-management.md — full rewrite. Oktopus migration
plan replaced by a "decommissioned" note + the existing GenieACS
+ modem-bridge architecture as the steady state. TP-Link XX230v
deep-dive sections preserved (still accurate).
docs/README.md, docs/features/README.md, docs/roadmap.md —
intent-table descriptions and live-URLs table corrected.
The docs/archive/ snapshots (2026-04-18, 2026-04-19) are untouched —
they're historical and should remain that way.
4.8 KiB
Gigafibre FSM — CPE Hardware Management
Managing the customer-premises equipment fleet (ONTs, routers, mesh nodes). Covers TR-069 (GenieACS), the modem-bridge for deep TP-Link diagnostics, and the diagnostic-swap workflow.
1. Protocol & Tooling Stack
The current ACS is GenieACS (TR-069 / CWMP — HTTP/SOAP polling). It's external to the prod box (separate VM managed by the network team). The hub talks to it via the GenieACS NBI (Northbound Interface) on the internal network.
About TR-369 / USP — we ran a parallel Oktopus CE deployment for
USP (TR-369 over MQTT/WebSocket) but decommissioned it in May 2026
after the broker filled the disk with debug spam. The integration
modules in the hub (lib/oktopus.js, lib/oktopus-mqtt.js) remain
in the tree gated behind OKTOPUS_DISABLED=1 so we can re-enable them
later if we settle on a different USP controller. For now, all CPE
management goes through TR-069 + GenieACS.
Polling cadence — GenieACS receives an Inform from each ONT every ~5 minutes (configurable in the device firmware). For real-time deep dives that can't wait for the next Inform window, the hub falls back to:
- modem-bridge (
services/modem-bridge/) — Playwright-driven HTTP client that scrapes encrypted TR-181 parameters from the modem's native admin UI. Specifically targets the TP-Link XX230v / Deco mesh. - OLT SNMP — direct SNMPv2 walk against the OLT for ONT optical status, when even the modem itself is unreachable.
2. TP-Link XX230v (Deco XE75) Deep Diagnostics
The XX230v exposes a rich TR-181 data model. When customers report "WiFi issues", CSRs and techs should not blindly swap the hardware. Poll these endpoints first to find the actual root cause.
A. Optical Signal (Is it the Fibre?)
Device.Optical.Interface.1.Stats.SignalRxPower → target: -8 to -25 dBm
Device.Optical.Interface.1.Stats.ErrorsSent
Diagnosis — RxPower < -25 dBm = dirty connector or fibre break. Not an ONT hardware fault. Don't swap; dispatch a fibre tech.
B. WiFi Radio & Topology (Is it Interference?)
Device.WiFi.Radio.1.Stats.Noise → 2.4 GHz interference
Device.WiFi.Radio.2.Stats.Noise → 5 GHz interference
Device.WiFi.MultiAP.APDevice.{i}.Radio.{j}.Utilization → mesh backhaul load
Diagnosis — high noise/errors on a band = environmental channel congestion (neighbours, microwave, baby monitor). High backhaul utilization on a satellite Deco = customer needs to move it closer to the main unit.
C. Live Speed Test (Is it the Client Device?)
Device.IP.Diagnostics.DownloadDiagnostics.DiagnosticsState = "Requested"
Diagnosis — kicks off a server-to-ONT speed test, which eliminates WiFi-side latency variables. ONT speed test fast + customer's iPhone slow → the iPhone or the WiFi link is the bottleneck, not the line.
3. The "Diagnostic Swap" Workflow
A common gap occurs when techs swap equipment simply because they aren't sure what is defective. This creates inventory chaos.
We use a 3-way diagnostic status instead of a binary Défectueux:
- Remplacement définitif — the equipment is dead. (Old → Défectueux, New → Actif)
- Swap diagnostic — swapping to test if the problem resolves. (Old → En diagnostic, New → Actif (temporary))
- Retour de diagnostic — the old unit was actually fine. (Old → Actif (returned to use), Test unit → Retourné)
If a tech chooses Swap diagnostic, an ERPNext Task is automatically
generated scheduling a follow-through test on the removed hardware
within 7 days. If the unit tests fine at the warehouse, it goes back
to En inventaire instead of being trashed.
4. Hub Endpoints (/devices/*)
The ops "Diagnostiquer" button on a customer or equipment row hits the hub, which orchestrates GenieACS / OLT / modem-bridge in the right order:
/devices/lookup?serial=X— finds the ONT by serial in GenieACS.- The hub returns the latest Inform snapshot (interface, mesh, wifi,
opticalStatus). If older than 60s, it kicks an immediate
Refreshtask to GenieACS. - For TP-Link models specifically, deeper params (encrypted TR-181) are fetched via modem-bridge if needed.
- Final response is a consolidated JSON consumed by
apps/ops/src/components/equipment/EquipmentDiagnostic.vue.
See module-interactions.md for the full call graph.
5. Cross-references
- ../architecture/module-interactions.md — hub call graph for device flows.
- vision-ocr.md — barcode scan during install (ONT serial, MAC) feeds back into Service Equipment.
- dispatch.md §5.6 — equipment install/swap from the tech mobile page.