gigafibre-fsm/docs/XX230V-DIAGNOSTICS-AND-OKTOPUS-TEST.md
louispaulb bfffed2b41 feat: ONT diagnostics — grouped mesh topology, signal RSSI, management link
- EquipmentDetail: collapsible node groups (clients grouped by mesh node)
- Signal strength as RSSI % (0-255 per 802.11-2020) with 10-tone color scale
- Management IP clickable link to device web GUI (/superadmin/)
- Fibre status compact top bar (status + Rx/Tx power when available)
- targo-hub: WAN IP detection across all VLAN interfaces
- targo-hub: full WiFi client count (direct + EasyMesh mesh repeaters)
- targo-hub: /devices/:id/hosts endpoint with client-to-node mapping
- ClientsPage: start empty, load only on search (no auto-load all)
- nginx: dynamic ollama resolver (won't crash if ollama is down)
- Cleanup: remove unused BillingKPIs.vue and TagInput.vue
- New docs and migration scripts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-03 21:26:14 -04:00

332 lines
13 KiB
Markdown

# XX230v Diagnostic Gaps & Oktopus Test Plan
## The Problem
When a customer reports WiFi issues, the tech currently has almost no data to diagnose whether the problem is:
- **The XX230v ONT** (GPON side failing, internal routing broken)
- **The Deco mesh** (WiFi radio issue, coverage gap)
- **The fibre itself** (signal degradation, dirty connector)
- **The customer's device** (driver issue, band-steering problem)
The current GenieACS provision (`xx230v_inform`) only reads:
- GPON serial, MAC, uptime
- WiFi SSID list (names only, no signal/client data)
- Connected hosts (cleared every inform! → useless for diagnosis)
**We're blind.** The XX230v exposes a rich TR-181 data model that we're not reading.
---
## What TR-181 Parameters the XX230v Actually Has
The XX230v (TP-Link Deco XE75 ONT variant) supports the full TR-181:2 data model. Here's what we SHOULD be reading but currently aren't:
### 1. Optical Signal (fibre health)
```
Device.Optical.Interface.1.Status → Up/Down
Device.Optical.Interface.1.Stats.SignalRxPower → dBm (target: -8 to -25)
Device.Optical.Interface.1.Stats.SignalTxPower → dBm
Device.Optical.Interface.1.Stats.BytesSent
Device.Optical.Interface.1.Stats.BytesReceived
Device.Optical.Interface.1.Stats.ErrorsSent
Device.Optical.Interface.1.Stats.ErrorsReceived
Device.Optical.Interface.1.Stats.DiscardPacketsSent
Device.Optical.Interface.1.Stats.DiscardPacketsReceived
Device.Optical.Interface.1.LowerLayers → underlying interface
```
**Diagnosis power:** If RxPower < -25 dBm fibre problem (dirty connector, bend, break). If errors high OLT port issue or fibre degradation. No need to swap the ONT.
### 2. WiFi Radios (per-band stats)
```
Device.WiFi.Radio.1.Status → Up/Down (2.4GHz)
Device.WiFi.Radio.1.Channel
Device.WiFi.Radio.1.OperatingFrequencyBand → 2.4GHz / 5GHz / 6GHz
Device.WiFi.Radio.1.CurrentOperatingChannelBandwidth → 20MHz/40MHz/80MHz/160MHz
Device.WiFi.Radio.1.Stats.Noise → dBm (background noise)
Device.WiFi.Radio.1.Stats.BytesSent
Device.WiFi.Radio.1.Stats.BytesReceived
Device.WiFi.Radio.1.Stats.ErrorsSent
Device.WiFi.Radio.1.Stats.ErrorsReceived
Device.WiFi.Radio.1.Stats.PacketsSent
Device.WiFi.Radio.1.Stats.PacketsReceived
Device.WiFi.Radio.2.* → 5GHz band
Device.WiFi.Radio.3.* → 6GHz band (XE75 has WiFi 6E)
```
**Diagnosis power:** High noise + errors on a specific band channel congestion (not a hardware fault). Channel 1/6/11 comparison shows interference.
### 3. WiFi Access Points & Connected Clients
```
Device.WiFi.AccessPoint.1.AssociatedDevice.{i}.
MACAddress → client MAC
SignalStrength → dBm (how well client hears the AP)
Noise → dBm (noise floor at client)
Retransmissions → high = poor signal quality
Active → currently connected?
LastDataDownlinkRate → actual PHY rate to client (Mbps)
LastDataUplinkRate → actual PHY rate from client (Mbps)
OperatingStandard → ax/ac/n/g (WiFi generation)
Device.WiFi.AccessPoint.1.
AssociatedDeviceNumberOfEntries → client count
Status → Enabled/Disabled
SSIDAdvertisementEnabled → SSID visible?
```
**Diagnosis power:** If client SignalStrength > -65 dBm but Retransmissions high → interference. If LastDataDownlinkRate < 50 Mbps on WiFi 6 something wrong with negotiation. If 0 clients on 5GHz band steering issue.
### 4. Deco Mesh Topology (MultiAP / EasyMesh)
```
Device.WiFi.MultiAP.APDevice.{i}.
MACAddress → mesh node MAC
Manufacturer
SerialNumber
Radio.{j}.
Noise
Utilization → % channel busy
AP.{k}.
SSID
UnicastBytesSent
AssociatedDevice.{l}.
MACAddress
SignalStrength
LastDataDownlinkRate
Device.WiFi.DataElements.Network.Device.{i}.
ID → mesh node ID
MultiAPCapabilities
Radio.{j}.
CurrentOperatingClassProfile
UnassociatedSTA.{k}. → devices seen but not connected
```
**Diagnosis power:** Shows which Deco node each client is connected to, signal quality per-node, backhaul utilization. If satellite Deco has poor backhaul move it closer or add wired backhaul.
### 5. IP Diagnostics (built-in speed/ping test)
```
Device.IP.Diagnostics.IPPing.
Host → target to ping
NumberOfRepetitions
Timeout
DataBlockSize
DiagnosticsState → set to "Requested" to trigger
→ Results:
SuccessCount, FailureCount
AverageResponseTime
MinimumResponseTime
MaximumResponseTime
Device.IP.Diagnostics.DownloadDiagnostics.
DownloadURL → URL to download from
DiagnosticsState → "Requested" to trigger
→ Results:
ROMTime, BOMTime, EOMTime
TestBytesReceived
TotalBytesReceived
→ Calculate: speed = TestBytesReceived / (EOMTime - BOMTime)
Device.IP.Diagnostics.UploadDiagnostics.
UploadURL → URL to upload to
DiagnosticsState → "Requested" to trigger
→ Results: same pattern as download
```
**Diagnosis power:** Remote speed test FROM the ONT itself. Eliminates WiFi as a variable. If ONTserver speed is good but client speed is bad WiFi issue. If ONTserver speed is bad fibre/OLT issue.
### 6. Ethernet Ports
```
Device.Ethernet.Interface.{i}.
Status → Up/Down
MACAddress
MaxBitRate → negotiated link speed (100/1000/2500)
DuplexMode → Full/Half
Stats.BytesSent/Received
Stats.ErrorsSent/Received
Stats.PacketsSent/Received
```
**Diagnosis power:** If Ethernet negotiated at 100Mbps instead of 1Gbps bad cable or port. If errors high physical layer issue.
### 7. DNS & Routing
```
Device.DNS.Client.Server.{i}.
DNSServer → configured DNS
Status
Type → DHCPv4 / Static
Device.Routing.Router.1.IPv4Forwarding.{i}.
DestIPAddress
GatewayIPAddress
Interface
Status
```
---
## What to Add to GenieACS Inform (Immediate)
Update the `xx230v_inform` provision to read diagnostic data on every inform:
```javascript
// === DIAGNOSTIC DATA (add to xx230v_inform) ===
// Optical signal — THE most important diagnostic
declare("Device.Optical.Interface.1.Status", {value: now});
declare("Device.Optical.Interface.1.Stats.SignalRxPower", {value: now});
declare("Device.Optical.Interface.1.Stats.SignalTxPower", {value: now});
declare("Device.Optical.Interface.1.Stats.ErrorsSent", {value: now});
declare("Device.Optical.Interface.1.Stats.ErrorsReceived", {value: now});
// WiFi radio stats (per band)
declare("Device.WiFi.Radio.1.Status", {value: now});
declare("Device.WiFi.Radio.1.Channel", {value: now});
declare("Device.WiFi.Radio.1.CurrentOperatingChannelBandwidth", {value: now});
declare("Device.WiFi.Radio.1.Stats.Noise", {value: now});
declare("Device.WiFi.Radio.1.Stats.ErrorsSent", {value: now});
declare("Device.WiFi.Radio.2.Status", {value: now});
declare("Device.WiFi.Radio.2.Channel", {value: now});
declare("Device.WiFi.Radio.2.CurrentOperatingChannelBandwidth", {value: now});
declare("Device.WiFi.Radio.2.Stats.Noise", {value: now});
declare("Device.WiFi.Radio.3.Status", {value: now}); // 6GHz if supported
declare("Device.WiFi.Radio.3.Channel", {value: now});
// Connected clients count per AP
declare("Device.WiFi.AccessPoint.1.AssociatedDeviceNumberOfEntries", {value: now});
declare("Device.WiFi.AccessPoint.2.AssociatedDeviceNumberOfEntries", {value: now});
declare("Device.WiFi.AccessPoint.3.AssociatedDeviceNumberOfEntries", {value: now});
// WAN IP (already partially read)
declare("Device.IP.Interface.1.IPv4Address.1.IPAddress", {value: now});
// Ethernet port status
declare("Device.Ethernet.Interface.1.Status", {value: now});
declare("Device.Ethernet.Interface.1.MaxBitRate", {value: now});
declare("Device.Ethernet.Interface.2.Status", {value: now});
declare("Device.Ethernet.Interface.2.MaxBitRate", {value: now});
```
---
## Equipment Swap Status — Fix the "Not Sure It's Defective" Problem
Instead of immediately marking as "Défectueux", the swap flow should support **diagnostic swap**:
### Updated Status Flow
```
Normal equipment statuses:
En inventaire → Actif → [issue reported] →
Option A: "En diagnostic" ← tech swaps to test, original goes back to warehouse
Option B: "Défectueux" ← confirmed dead
Option C: "Retourné" ← returned to stock after test (was fine)
Swap types:
1. "Remplacement définitif" → old = Défectueux, new = Actif
2. "Swap diagnostic" → old = En diagnostic, new = Actif (temporary)
3. "Retour de diagnostic" → old = Actif (put back), temp = Retourné
```
### Updated Wizard Step 2 (Reason)
```
Step 2: "Type de remplacement" [select-cards]
🔴 Remplacement définitif — L'équipement est mort
🟡 Swap diagnostic — Tester si le problème vient de l'équipement
🔵 Mise à niveau — Remplacer par un modèle supérieur
```
If "Swap diagnostic" is chosen:
- Old equipment status "En diagnostic" (not Défectueux)
- A follow-up task is auto-created: "Diagnostic result for {serial}"
- Due in 7 days
- Options: "Confirmed defective" mark Défectueux | "Works fine" mark Retourné
---
## Oktopus Test Plan — What to Verify
### Step 1: Deploy Oktopus with TR-069 Adapter
Oktopus is already at `oss.gigafibre.ca` with 8 containers. We need to:
1. **Verify it's running**: `curl https://oss.gigafibre.ca/api/health`
2. **Check the TR-069 adapter port**: Oktopus listens for CWMP connections (typically 7547)
3. **Check the admin UI**: Oktopus CE has a web dashboard
### Step 2: Point ONE Test XX230v to Oktopus
On GenieACS, push a parameter change to a single test device:
```
Device.ManagementServer.URL = "https://acs-new.gigafibre.ca:7547"
```
Or physically: configure the ACS URL in the XX230v admin panel.
### Step 3: What to Check on Oktopus
Once the XX230v connects to Oktopus:
| Check | What to Look For | Why It Matters |
|-------|-----------------|----------------|
| Device appears | Does Oktopus see the TR-069 inform? | Basic connectivity |
| Data model | Does it show the full TR-181 tree? | TR-181 paths are the same |
| Parameter read | Can we read `Device.Optical.Interface.1.Stats.SignalRxPower`? | Real-time diagnostics |
| Parameter set | Can we push WiFi SSID/password? | Provisioning works |
| Reboot | Can we trigger a remote reboot? | Basic management |
| Firmware | Can we push a firmware update? | Fleet management |
| Tasks | Can we queue tasks for next inform? | Offline task queue |
| Webhooks | Does Oktopus fire webhooks on events? | n8n integration |
| Multiple devices | Can it handle the full fleet? | Scalability |
### Step 4: TR-369 (USP) Check
If the XX230v firmware supports USP (check TP-Link release notes for your version):
```
Device.LocalAgent.Controller.{i}.
EndpointID
Protocol → MQTT / WebSocket
MTP.{j}.
Enable
Protocol
MQTT.Topic
```
If USP is available configure Oktopus as USP controller real-time bidirectional management (no more polling!).
### Step 5: Compare GenieACS vs Oktopus Data
Run the same device through both ACS simultaneously (different inform intervals) and compare:
| Aspect | GenieACS | Oktopus | Notes |
|--------|----------|---------|-------|
| Inform data | _lastInform, basic params | ? | Same TR-181 paths |
| WiFi clients | Cleared by provision (broken!) | ? | Don't clear in Oktopus |
| Optical power | Only via summarizeDevice() | ? | Should be real-time |
| Remote diagnostics | Via NBI task push | Via USP Operate | USP is synchronous |
| Webhook events | None (we poll) | Built-in | Major improvement |
| Firmware mgmt | GridFS upload + provision | Native firmware repo | Cleaner |
| Bulk operations | JS provisions | Device groups + policies | More scalable |
---
## Immediate Actions
### 1. Update xx230v_inform provision (15 min)
Add the diagnostic parameters listed above. Deploys instantly to all XX230v fleet via GenieACS.
### 2. Update summarizeDevice() in targo-hub (15 min)
Add optical signal, WiFi radio stats, client counts, Ethernet link speed to the device summary.
### 3. Update Ops app device detail view (30 min)
Show the new diagnostic data in the device panel (Optical power, WiFi channels, client count per band).
### 4. SSH to Oktopus server, verify it's running (10 min)
Check `docker compose ps` at `/opt/oktopus/`, verify API responds, check admin UI.
### 5. Point one test XX230v to Oktopus (10 min)
Via GenieACS: set `Device.ManagementServer.URL` on a single device.
### 6. Add "En diagnostic" status to Service Equipment (5 min)
Add the new status option + update swap flow logic.