gigafibre-fsm/docs/XX230V-DIAGNOSTICS-AND-OKTOPUS-TEST.md
louispaulb bfffed2b41 feat: ONT diagnostics — grouped mesh topology, signal RSSI, management link
- EquipmentDetail: collapsible node groups (clients grouped by mesh node)
- Signal strength as RSSI % (0-255 per 802.11-2020) with 10-tone color scale
- Management IP clickable link to device web GUI (/superadmin/)
- Fibre status compact top bar (status + Rx/Tx power when available)
- targo-hub: WAN IP detection across all VLAN interfaces
- targo-hub: full WiFi client count (direct + EasyMesh mesh repeaters)
- targo-hub: /devices/:id/hosts endpoint with client-to-node mapping
- ClientsPage: start empty, load only on search (no auto-load all)
- nginx: dynamic ollama resolver (won't crash if ollama is down)
- Cleanup: remove unused BillingKPIs.vue and TagInput.vue
- New docs and migration scripts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-03 21:26:14 -04:00

13 KiB

XX230v Diagnostic Gaps & Oktopus Test Plan

The Problem

When a customer reports WiFi issues, the tech currently has almost no data to diagnose whether the problem is:

  • The XX230v ONT (GPON side failing, internal routing broken)
  • The Deco mesh (WiFi radio issue, coverage gap)
  • The fibre itself (signal degradation, dirty connector)
  • The customer's device (driver issue, band-steering problem)

The current GenieACS provision (xx230v_inform) only reads:

  • GPON serial, MAC, uptime
  • WiFi SSID list (names only, no signal/client data)
  • Connected hosts (cleared every inform! → useless for diagnosis)

We're blind. The XX230v exposes a rich TR-181 data model that we're not reading.


What TR-181 Parameters the XX230v Actually Has

The XX230v (TP-Link Deco XE75 ONT variant) supports the full TR-181:2 data model. Here's what we SHOULD be reading but currently aren't:

1. Optical Signal (fibre health)

Device.Optical.Interface.1.Status                    → Up/Down
Device.Optical.Interface.1.Stats.SignalRxPower        → dBm (target: -8 to -25)
Device.Optical.Interface.1.Stats.SignalTxPower        → dBm
Device.Optical.Interface.1.Stats.BytesSent
Device.Optical.Interface.1.Stats.BytesReceived
Device.Optical.Interface.1.Stats.ErrorsSent
Device.Optical.Interface.1.Stats.ErrorsReceived
Device.Optical.Interface.1.Stats.DiscardPacketsSent
Device.Optical.Interface.1.Stats.DiscardPacketsReceived
Device.Optical.Interface.1.LowerLayers               → underlying interface

Diagnosis power: If RxPower < -25 dBm → fibre problem (dirty connector, bend, break). If errors high → OLT port issue or fibre degradation. No need to swap the ONT.

2. WiFi Radios (per-band stats)

Device.WiFi.Radio.1.Status                           → Up/Down (2.4GHz)
Device.WiFi.Radio.1.Channel
Device.WiFi.Radio.1.OperatingFrequencyBand           → 2.4GHz / 5GHz / 6GHz
Device.WiFi.Radio.1.CurrentOperatingChannelBandwidth  → 20MHz/40MHz/80MHz/160MHz
Device.WiFi.Radio.1.Stats.Noise                      → dBm (background noise)
Device.WiFi.Radio.1.Stats.BytesSent
Device.WiFi.Radio.1.Stats.BytesReceived
Device.WiFi.Radio.1.Stats.ErrorsSent
Device.WiFi.Radio.1.Stats.ErrorsReceived
Device.WiFi.Radio.1.Stats.PacketsSent
Device.WiFi.Radio.1.Stats.PacketsReceived

Device.WiFi.Radio.2.*                                → 5GHz band
Device.WiFi.Radio.3.*                                → 6GHz band (XE75 has WiFi 6E)

Diagnosis power: High noise + errors on a specific band → channel congestion (not a hardware fault). Channel 1/6/11 comparison shows interference.

3. WiFi Access Points & Connected Clients

Device.WiFi.AccessPoint.1.AssociatedDevice.{i}.
  MACAddress                → client MAC
  SignalStrength             → dBm (how well client hears the AP)
  Noise                     → dBm (noise floor at client)
  Retransmissions            → high = poor signal quality
  Active                    → currently connected?
  LastDataDownlinkRate       → actual PHY rate to client (Mbps)
  LastDataUplinkRate         → actual PHY rate from client (Mbps)
  OperatingStandard          → ax/ac/n/g (WiFi generation)

Device.WiFi.AccessPoint.1.
  AssociatedDeviceNumberOfEntries → client count
  Status                    → Enabled/Disabled
  SSIDAdvertisementEnabled   → SSID visible?

Diagnosis power: If client SignalStrength > -65 dBm but Retransmissions high → interference. If LastDataDownlinkRate < 50 Mbps on WiFi 6 → something wrong with negotiation. If 0 clients on 5GHz → band steering issue.

4. Deco Mesh Topology (MultiAP / EasyMesh)

Device.WiFi.MultiAP.APDevice.{i}.
  MACAddress                → mesh node MAC
  Manufacturer
  SerialNumber
  Radio.{j}.
    Noise
    Utilization              → % channel busy
    AP.{k}.
      SSID
      UnicastBytesSent
      AssociatedDevice.{l}.
        MACAddress
        SignalStrength
        LastDataDownlinkRate

Device.WiFi.DataElements.Network.Device.{i}.
  ID                        → mesh node ID
  MultiAPCapabilities
  Radio.{j}.
    CurrentOperatingClassProfile
    UnassociatedSTA.{k}.     → devices seen but not connected

Diagnosis power: Shows which Deco node each client is connected to, signal quality per-node, backhaul utilization. If satellite Deco has poor backhaul → move it closer or add wired backhaul.

5. IP Diagnostics (built-in speed/ping test)

Device.IP.Diagnostics.IPPing.
  Host                      → target to ping
  NumberOfRepetitions
  Timeout
  DataBlockSize
  DiagnosticsState           → set to "Requested" to trigger
  → Results:
  SuccessCount, FailureCount
  AverageResponseTime
  MinimumResponseTime
  MaximumResponseTime

Device.IP.Diagnostics.DownloadDiagnostics.
  DownloadURL               → URL to download from
  DiagnosticsState           → "Requested" to trigger
  → Results:
  ROMTime, BOMTime, EOMTime
  TestBytesReceived
  TotalBytesReceived
  → Calculate: speed = TestBytesReceived / (EOMTime - BOMTime)

Device.IP.Diagnostics.UploadDiagnostics.
  UploadURL                 → URL to upload to
  DiagnosticsState           → "Requested" to trigger
  → Results: same pattern as download

Diagnosis power: Remote speed test FROM the ONT itself. Eliminates WiFi as a variable. If ONT→server speed is good but client speed is bad → WiFi issue. If ONT→server speed is bad → fibre/OLT issue.

6. Ethernet Ports

Device.Ethernet.Interface.{i}.
  Status                    → Up/Down
  MACAddress
  MaxBitRate                → negotiated link speed (100/1000/2500)
  DuplexMode                → Full/Half
  Stats.BytesSent/Received
  Stats.ErrorsSent/Received
  Stats.PacketsSent/Received

Diagnosis power: If Ethernet negotiated at 100Mbps instead of 1Gbps → bad cable or port. If errors high → physical layer issue.

7. DNS & Routing

Device.DNS.Client.Server.{i}.
  DNSServer                 → configured DNS
  Status
  Type                      → DHCPv4 / Static

Device.Routing.Router.1.IPv4Forwarding.{i}.
  DestIPAddress
  GatewayIPAddress
  Interface
  Status

What to Add to GenieACS Inform (Immediate)

Update the xx230v_inform provision to read diagnostic data on every inform:

// === DIAGNOSTIC DATA (add to xx230v_inform) ===

// Optical signal — THE most important diagnostic
declare("Device.Optical.Interface.1.Status", {value: now});
declare("Device.Optical.Interface.1.Stats.SignalRxPower", {value: now});
declare("Device.Optical.Interface.1.Stats.SignalTxPower", {value: now});
declare("Device.Optical.Interface.1.Stats.ErrorsSent", {value: now});
declare("Device.Optical.Interface.1.Stats.ErrorsReceived", {value: now});

// WiFi radio stats (per band)
declare("Device.WiFi.Radio.1.Status", {value: now});
declare("Device.WiFi.Radio.1.Channel", {value: now});
declare("Device.WiFi.Radio.1.CurrentOperatingChannelBandwidth", {value: now});
declare("Device.WiFi.Radio.1.Stats.Noise", {value: now});
declare("Device.WiFi.Radio.1.Stats.ErrorsSent", {value: now});
declare("Device.WiFi.Radio.2.Status", {value: now});
declare("Device.WiFi.Radio.2.Channel", {value: now});
declare("Device.WiFi.Radio.2.CurrentOperatingChannelBandwidth", {value: now});
declare("Device.WiFi.Radio.2.Stats.Noise", {value: now});
declare("Device.WiFi.Radio.3.Status", {value: now});  // 6GHz if supported
declare("Device.WiFi.Radio.3.Channel", {value: now});

// Connected clients count per AP
declare("Device.WiFi.AccessPoint.1.AssociatedDeviceNumberOfEntries", {value: now});
declare("Device.WiFi.AccessPoint.2.AssociatedDeviceNumberOfEntries", {value: now});
declare("Device.WiFi.AccessPoint.3.AssociatedDeviceNumberOfEntries", {value: now});

// WAN IP (already partially read)
declare("Device.IP.Interface.1.IPv4Address.1.IPAddress", {value: now});

// Ethernet port status
declare("Device.Ethernet.Interface.1.Status", {value: now});
declare("Device.Ethernet.Interface.1.MaxBitRate", {value: now});
declare("Device.Ethernet.Interface.2.Status", {value: now});
declare("Device.Ethernet.Interface.2.MaxBitRate", {value: now});

Equipment Swap Status — Fix the "Not Sure It's Defective" Problem

Instead of immediately marking as "Défectueux", the swap flow should support diagnostic swap:

Updated Status Flow

Normal equipment statuses:
  En inventaire → Actif → [issue reported] →

  Option A: "En diagnostic"    ← tech swaps to test, original goes back to warehouse
  Option B: "Défectueux"       ← confirmed dead
  Option C: "Retourné"         ← returned to stock after test (was fine)

Swap types:
  1. "Remplacement définitif"  → old = Défectueux, new = Actif
  2. "Swap diagnostic"         → old = En diagnostic, new = Actif (temporary)
  3. "Retour de diagnostic"    → old = Actif (put back), temp = Retourné

Updated Wizard Step 2 (Reason)

Step 2: "Type de remplacement" [select-cards]
  🔴 Remplacement définitif — L'équipement est mort
  🟡 Swap diagnostic — Tester si le problème vient de l'équipement
  🔵 Mise à niveau — Remplacer par un modèle supérieur

If "Swap diagnostic" is chosen:

  • Old equipment → status "En diagnostic" (not Défectueux)
  • A follow-up task is auto-created: "Diagnostic result for {serial}"
    • Due in 7 days
    • Options: "Confirmed defective" → mark Défectueux | "Works fine" → mark Retourné

Oktopus Test Plan — What to Verify

Step 1: Deploy Oktopus with TR-069 Adapter

Oktopus is already at oss.gigafibre.ca with 8 containers. We need to:

  1. Verify it's running: curl https://oss.gigafibre.ca/api/health
  2. Check the TR-069 adapter port: Oktopus listens for CWMP connections (typically 7547)
  3. Check the admin UI: Oktopus CE has a web dashboard

Step 2: Point ONE Test XX230v to Oktopus

On GenieACS, push a parameter change to a single test device:

Device.ManagementServer.URL = "https://acs-new.gigafibre.ca:7547"

Or physically: configure the ACS URL in the XX230v admin panel.

Step 3: What to Check on Oktopus

Once the XX230v connects to Oktopus:

Check What to Look For Why It Matters
Device appears Does Oktopus see the TR-069 inform? Basic connectivity
Data model Does it show the full TR-181 tree? TR-181 paths are the same
Parameter read Can we read Device.Optical.Interface.1.Stats.SignalRxPower? Real-time diagnostics
Parameter set Can we push WiFi SSID/password? Provisioning works
Reboot Can we trigger a remote reboot? Basic management
Firmware Can we push a firmware update? Fleet management
Tasks Can we queue tasks for next inform? Offline task queue
Webhooks Does Oktopus fire webhooks on events? n8n integration
Multiple devices Can it handle the full fleet? Scalability

Step 4: TR-369 (USP) Check

If the XX230v firmware supports USP (check TP-Link release notes for your version):

Device.LocalAgent.Controller.{i}.
  EndpointID
  Protocol                  → MQTT / WebSocket
  MTP.{j}.
    Enable
    Protocol
    MQTT.Topic

If USP is available → configure Oktopus as USP controller → real-time bidirectional management (no more polling!).

Step 5: Compare GenieACS vs Oktopus Data

Run the same device through both ACS simultaneously (different inform intervals) and compare:

Aspect GenieACS Oktopus Notes
Inform data _lastInform, basic params ? Same TR-181 paths
WiFi clients Cleared by provision (broken!) ? Don't clear in Oktopus
Optical power Only via summarizeDevice() ? Should be real-time
Remote diagnostics Via NBI task push Via USP Operate USP is synchronous
Webhook events None (we poll) Built-in Major improvement
Firmware mgmt GridFS upload + provision Native firmware repo Cleaner
Bulk operations JS provisions Device groups + policies More scalable

Immediate Actions

1. Update xx230v_inform provision (15 min)

Add the diagnostic parameters listed above. Deploys instantly to all XX230v fleet via GenieACS.

2. Update summarizeDevice() in targo-hub (15 min)

Add optical signal, WiFi radio stats, client counts, Ethernet link speed to the device summary.

3. Update Ops app device detail view (30 min)

Show the new diagnostic data in the device panel (Optical power, WiFi channels, client count per band).

4. SSH to Oktopus server, verify it's running (10 min)

Check docker compose ps at /opt/oktopus/, verify API responds, check admin UI.

5. Point one test XX230v to Oktopus (10 min)

Via GenieACS: set Device.ManagementServer.URL on a single device.

6. Add "En diagnostic" status to Service Equipment (5 min)

Add the new status option + update swap flow logic.