Mass refresh — the docs were last touched 2026-04-22, two weeks behind
shipped reality. This commit updates 9 files to reflect current truth.
WHAT CHANGED IN THE PRODUCT (since 22 Apr) THAT THE DOCS NOW REFLECT:
• Oktopus CE / TR-369 stack decommissioned (containers + volumes +
images all removed; broker had filled /dev/sdb with 75 GB of debug
logs and took ERPNext down for 4 days). Hub gates the integration
behind OKTOPUS_DISABLED=1 — modules retained, no-op'd at runtime.
• dispatch.gigafibre.ca (legacy PHP SPA) replaced by an nginx 301
redirect to /ops/#/dispatch.
• Top toolbar of the dispatch module: collapsed to single-color
Lucide icons + ⋯ overflow menu + "Vue principale ▾" + "[👥 N ▾]"
resource type chip (defaults to techs, materials in the dropdown
only when relevant).
• Tech home base / departure point: editable per-tech via 📍 button,
address geocode (Nominatim) or click-on-map picker, right-click
on tech pin opens the same actions. Map defaults centered on
Gigafibre HQ (1867 chemin de la Rivière, Sainte-Clotilde) instead
of downtown Montreal.
• POST /auth/users invite flow on the hub: creates the Authentik
user, sets a temp password, mails it via Mailjet (Authentik's
own recovery flow isn't configured), creates the matching ERPNext
System User. Surfaced in ops Settings → Utilisateurs → Inviter.
• Two Authentik instances clarified as parallel-and-permanent (not
a migration): auth.targo.ca for staff, id.gigafibre.ca for clients.
FILES TOUCHED:
README.md — service table refreshed, arch diagram redrawn (no
Oktopus row), auth section explains the invite flow + two
parallel instances.
docs/architecture/overview.md — new "Decommissioned" section,
correct retirement status for dispatch-app + apps/field, two
Authentik instances explicitly distinguished, dev-gotchas list
rewritten (drops MongoDB AVX, adds log-rotation hard-learned
lesson, adds note about Authentik recovery flow).
docs/architecture/data-model.md — Step 5 hardware provisioning
now describes the GenieACS path (TR-069 Inform → preset push)
instead of the dead TR-369 path.
docs/architecture/module-interactions.md — oktopus.js and
oktopus-mqtt.js entries marked as gated, provision.js note
updated, GenieACS row in external-integrations updated, MQTT
row removed from real-time channels, interaction matrix loses
the Oktopus column and gains an Authentik admin REST cell.
docs/features/dispatch.md — Top bar section completely rewritten
to match the current chrome (left/center/right regions,
single-color Lucide, dropdowns); new Tech home base section
documenting the 📍 + map-pick + right-click flows; retirement
note now reads as a status, not a plan.
docs/features/cpe-management.md — full rewrite. Oktopus migration
plan replaced by a "decommissioned" note + the existing GenieACS
+ modem-bridge architecture as the steady state. TP-Link XX230v
deep-dive sections preserved (still accurate).
docs/README.md, docs/features/README.md, docs/roadmap.md —
intent-table descriptions and live-URLs table corrected.
The docs/archive/ snapshots (2026-04-18, 2026-04-19) are untouched —
they're historical and should remain that way.
125 lines
4.8 KiB
Markdown
125 lines
4.8 KiB
Markdown
# Gigafibre FSM — CPE Hardware Management
|
|
|
|
> Managing the customer-premises equipment fleet (ONTs, routers, mesh
|
|
> nodes). Covers TR-069 (GenieACS), the modem-bridge for deep TP-Link
|
|
> diagnostics, and the diagnostic-swap workflow.
|
|
|
|
---
|
|
|
|
## 1. Protocol & Tooling Stack
|
|
|
|
The current ACS is **GenieACS** (TR-069 / CWMP — HTTP/SOAP polling).
|
|
It's external to the prod box (separate VM managed by the network
|
|
team). The hub talks to it via the GenieACS NBI (Northbound Interface)
|
|
on the internal network.
|
|
|
|
**About TR-369 / USP** — we ran a parallel Oktopus CE deployment for
|
|
USP (TR-369 over MQTT/WebSocket) but **decommissioned it in May 2026**
|
|
after the broker filled the disk with debug spam. The integration
|
|
modules in the hub (`lib/oktopus.js`, `lib/oktopus-mqtt.js`) remain
|
|
in the tree gated behind `OKTOPUS_DISABLED=1` so we can re-enable them
|
|
later if we settle on a different USP controller. For now, all CPE
|
|
management goes through TR-069 + GenieACS.
|
|
|
|
**Polling cadence** — GenieACS receives an Inform from each ONT every
|
|
~5 minutes (configurable in the device firmware). For real-time deep
|
|
dives that can't wait for the next Inform window, the hub falls back
|
|
to:
|
|
- **modem-bridge** (`services/modem-bridge/`) — Playwright-driven HTTP
|
|
client that scrapes encrypted TR-181 parameters from the modem's
|
|
native admin UI. Specifically targets the TP-Link XX230v / Deco mesh.
|
|
- **OLT SNMP** — direct SNMPv2 walk against the OLT for ONT optical
|
|
status, when even the modem itself is unreachable.
|
|
|
|
---
|
|
|
|
## 2. TP-Link XX230v (Deco XE75) Deep Diagnostics
|
|
|
|
The XX230v exposes a rich TR-181 data model. When customers report
|
|
"WiFi issues", CSRs and techs should not blindly swap the hardware.
|
|
Poll these endpoints first to find the actual root cause.
|
|
|
|
### A. Optical Signal (Is it the Fibre?)
|
|
|
|
```text
|
|
Device.Optical.Interface.1.Stats.SignalRxPower → target: -8 to -25 dBm
|
|
Device.Optical.Interface.1.Stats.ErrorsSent
|
|
```
|
|
|
|
**Diagnosis** — RxPower < -25 dBm = dirty connector or fibre break.
|
|
**Not** an ONT hardware fault. Don't swap; dispatch a fibre tech.
|
|
|
|
### B. WiFi Radio & Topology (Is it Interference?)
|
|
|
|
```text
|
|
Device.WiFi.Radio.1.Stats.Noise → 2.4 GHz interference
|
|
Device.WiFi.Radio.2.Stats.Noise → 5 GHz interference
|
|
Device.WiFi.MultiAP.APDevice.{i}.Radio.{j}.Utilization → mesh backhaul load
|
|
```
|
|
|
|
**Diagnosis** — high noise/errors on a band = environmental channel
|
|
congestion (neighbours, microwave, baby monitor). High backhaul
|
|
utilization on a satellite Deco = customer needs to move it closer
|
|
to the main unit.
|
|
|
|
### C. Live Speed Test (Is it the Client Device?)
|
|
|
|
```text
|
|
Device.IP.Diagnostics.DownloadDiagnostics.DiagnosticsState = "Requested"
|
|
```
|
|
|
|
**Diagnosis** — kicks off a server-to-ONT speed test, which eliminates
|
|
WiFi-side latency variables. ONT speed test fast + customer's iPhone
|
|
slow → the iPhone or the WiFi link is the bottleneck, not the line.
|
|
|
|
---
|
|
|
|
## 3. The "Diagnostic Swap" Workflow
|
|
|
|
A common gap occurs when techs swap equipment simply because they
|
|
aren't sure what is defective. This creates inventory chaos.
|
|
|
|
We use a **3-way diagnostic status** instead of a binary `Défectueux`:
|
|
|
|
1. **Remplacement définitif** — the equipment is dead.
|
|
*(Old → Défectueux, New → Actif)*
|
|
2. **Swap diagnostic** — swapping to test if the problem resolves.
|
|
*(Old → En diagnostic, New → Actif (temporary))*
|
|
3. **Retour de diagnostic** — the old unit was actually fine.
|
|
*(Old → Actif (returned to use), Test unit → Retourné)*
|
|
|
|
If a tech chooses **Swap diagnostic**, an ERPNext Task is automatically
|
|
generated scheduling a follow-through test on the removed hardware
|
|
within 7 days. If the unit tests fine at the warehouse, it goes back
|
|
to `En inventaire` instead of being trashed.
|
|
|
|
---
|
|
|
|
## 4. Hub Endpoints (`/devices/*`)
|
|
|
|
The ops "Diagnostiquer" button on a customer or equipment row hits the
|
|
hub, which orchestrates GenieACS / OLT / modem-bridge in the right
|
|
order:
|
|
|
|
1. `/devices/lookup?serial=X` — finds the ONT by serial in GenieACS.
|
|
2. The hub returns the latest Inform snapshot (interface, mesh, wifi,
|
|
opticalStatus). If older than 60s, it kicks an immediate
|
|
`Refresh` task to GenieACS.
|
|
3. For TP-Link models specifically, deeper params (encrypted TR-181)
|
|
are fetched via modem-bridge if needed.
|
|
4. Final response is a consolidated JSON consumed by
|
|
`apps/ops/src/components/equipment/EquipmentDiagnostic.vue`.
|
|
|
|
See [module-interactions.md](../architecture/module-interactions.md)
|
|
for the full call graph.
|
|
|
|
---
|
|
|
|
## 5. Cross-references
|
|
|
|
- [../architecture/module-interactions.md](../architecture/module-interactions.md) — hub call graph for device flows.
|
|
- [vision-ocr.md](vision-ocr.md) — barcode scan during install (ONT
|
|
serial, MAC) feeds back into Service Equipment.
|
|
- [dispatch.md](dispatch.md) §5.6 — equipment install/swap from the
|
|
tech mobile page.
|