gigafibre-fsm/docs/features/cpe-management.md
louispaulb beb6ddc5e5 docs: reorganize into architecture/features/reference/archive folders
All docs moved with git mv so --follow preserves history. Flattens the
single-folder layout into goal-oriented folders and adds a README.md index
at every level.

- docs/README.md — new landing page with "I want to…" intent table
- docs/architecture/ — overview, data-model, app-design
- docs/features/ — billing-payments, cpe-management, vision-ocr, flow-editor
- docs/reference/ — erpnext-item-diff, legacy-wizard/
- docs/archive/ — HANDOFF-2026-04-18, MIGRATION, status-snapshots/
- docs/assets/ — pptx sources, build scripts (fixed hardcoded path)
- roadmap.md gains a "Modules in production" section with clickable
  URLs for every ops/tech/portal route and admin surface
- Phase 4 (Customer Portal) flipped to "Largely Shipped" based on
  audit of services/targo-hub/lib/payments.js (16 endpoints, webhook,
  PPA cron, Klarna BNPL all live)
- Archive files get an "ARCHIVED" banner so stale links inside them
  don't mislead readers

Code comments + nginx configs rewritten to use new doc paths. Root
README.md documentation table replaced with intent-oriented index.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-22 11:51:33 -04:00

62 lines
3.6 KiB
Markdown

# Gigafibre FSM — CPE Hardware Management
> A consolidated guide for managing Customer Premises Equipment (CPE) fleets. This document covers TR-069/TR-369 protocols, ACS migration strategies, and deep hardware diagnostics (specifically the TP-Link XX230v / Deco).
---
## 1. Protocol Strategy (TR-069 to TR-369)
We are gradually migrating our management plane from **GenieACS** (TR-069, HTTP/SOAP polling) to **Oktopus CE** (TR-369, real-time USP over MQTT/WebSocket).
**The goal is bidirectional, real-time device management.** TR-069 waits for the next "inform" interval (often hours) before executing a reboot or reading parameters. TR-369 maintains a constant socket connection.
### Migration Phases
1. **Parallel Run:** Oktopus is deployed at `oss.gigafibre.ca`. It has a TR-069 adapter, allowing it to natively accept legacy CWMP connections.
2. **Translation:** We must manually map old GenieACS JS provision scripts directly against the Oktopus event subscriptions and policy webhook engine.
3. **Migration:** Update the `Device.ManagementServer.URL` on CPEs to point to the new Oktopus TR-069 Adapter. Keep GenieACS read-only.
4. **Upgrade to TR-369:** Where CPE firmware allows (e.g., ZTE F680, Nokia G-010G-Q 3.x+), push firmware updates that include a native USP agent and point them to Oktopus MQTT.
---
## 2. TP-Link XX230v (Deco XE75) Deep Diagnostics
The TP-Link XX230v supports a rich TR-181 data model. When customers report "WiFi issues", CSRs and Techs should not blindly swap the hardware. The following endpoints must be polled to ascertain the actual root cause of the fault.
### A. Optical Signal (Is it the Fibre?)
```text
Device.Optical.Interface.1.Stats.SignalRxPower → target: -8 to -25 dBm
Device.Optical.Interface.1.Stats.ErrorsSent
```
*Diagnosis:* If RxPower < -25 dBm, there is a dirty connector or a fibre break. It is **not** a hardware fault with the ONT. Do not swap it.
### B. WiFi Radio & Topology (Is it Interference?)
```text
Device.WiFi.Radio.1.Stats.Noise → Interference measure (2.4Ghz)
Device.WiFi.Radio.2.Stats.Noise → Interference measure (5GHz)
Device.WiFi.MultiAP.APDevice.{i}.Radio.{j}.Utilization → Backhaul traffic load on Deco Mesh
```
*Diagnosis:* High noise/errors on a specific band indicates environmental channel congestion. If the backhaul utilization is extremely high on the satellite Deco, tell the customer to move it closer to the main unit.
### C. Live Speed Test (Is it the Client Device?)
```text
Device.IP.Diagnostics.DownloadDiagnostics.DiagnosticsState = "Requested"
```
*Diagnosis:* You can mandate the ONT line to perform its own speed test, eliminating WiFi latency variables. If the ONT download test is fast, but the customer's iPhone is slow, the iPhone or the WiFi signal is to blame.
---
## 3. The "Diagnostic Swap" Workflow
A common gap occurs when techs swap equipment simply because they aren't sure what is defective. This creates inventory chaos.
**We are pivoting from binary status (`Défectueux`) to a 3-way diagnostic status:**
1. **Remplacement Définitif** The equipment is dead.
*(Old = Défectueux, New = Actif)*
2. **Swap Diagnostic** Swapping to test if the problem resolves.
*(Old = En diagnostic, New = Actif temporary)*
3. **Retour de diagnostic** The old unit was fine. It is returned.
*(Old = Actif (put back into use), Test unit = Retourné)*
If a tech chooses **Swap diagnostic**, an ERPNext Task is automatically generated scheduling a follow-through test on the removed hardware within 7 days. If the unit tests fine at the warehouse, it is restored back into `En inventaire` instead of being trashed.