gigafibre-fsm/docs/VISION_AND_OCR.md
louispaulb e50ea88c08 feat: unify vision on Gemini + port field tech scan/device into /j
- Invoice OCR migrated from Ollama (GPU-bound, local) to Gemini 2.5
  Flash via new targo-hub /vision/invoice endpoint with responseSchema
  enforcement. Ops VM no longer needs a GPU.
- Ops /j/* now has full camera scanner (TechScanPage) ported from
  apps/field with 8s timeout + offline queue + auto-link to Dispatch
  Job context on serial/barcode/MAC 3-tier lookup.
- New TechDevicePage reached via /j/device/:serial showing every
  ERPNext entity related to a scanned device: Service Equipment,
  Customer, Service Location, active Subscription, open Issues,
  upcoming Dispatch Jobs, OLT info.
- New docs/VISION_AND_OCR.md (full pipeline + §10 relationship graph
  + §8.1 secrets/rotation policy). Cross-linked from ARCHITECTURE,
  ROADMAP, HANDOFF, README.
- Nginx /ollama/ proxy blocks removed from both ops + field.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-22 11:26:01 -04:00

23 KiB

Vision & OCR Pipeline

All vision runs on Gemini 2.5 Flash via targo-hub. No local Ollama. The ops/ERPNext VM has no GPU, so every vision request — bills, barcodes, equipment labels — goes to Google's Gemini API from a single backend service and gets normalized before hitting the frontend.

Last refreshed: 2026-04-22 (cutover from Ollama → Gemini)


1. Architecture at a glance

          ┌──────────────────┐        ┌───────────────────────┐
          │ apps/ops (PWA)   │        │ apps/field (PWA)      │
          │ /ops/*           │        │ /field/*   (retiring) │
          └────────┬─────────┘        └──────────┬────────────┘
                   │                             │
                   │  src/api/ocr.js             │  src/api/ocr.js
                   │  {ocrBill, scanBarcodes,    │  {ocrBill, scanBarcodes,
                   │   scanEquipmentLabel}       │   checkOllamaStatus}
                   │                             │
                   └──────────────┬──────────────┘
                                  │  POST https://msg.gigafibre.ca/vision/*
                                  ▼
                        ┌───────────────────────┐
                        │  targo-hub            │
                        │  lib/vision.js        │
                        │   ├─ /vision/barcodes │
                        │   ├─ /vision/equipment│
                        │   └─ /vision/invoice  │
                        └──────────┬────────────┘
                                   │  generativelanguage.googleapis.com
                                   ▼
                        ┌───────────────────────┐
                        │  Gemini 2.5 Flash     │
                        │  (text + image, JSON  │
                        │   responseSchema)     │
                        └───────────────────────┘

Why route everything through the hub:

  1. No GPU on ops VM. The only machine with a local Ollama was retired in Phase 2.5. Centralizing on Gemini means the frontend stops caring where inference happens.
  2. Single AI_API_KEY rotation surface. Key lives in the hub env only.
  3. Schema guarantees. Gemini supports responseSchema in the v1beta API — the hub enforces it per endpoint, so the frontend can trust the JSON shape without defensive parsing.
  4. Observability. Every call is logged in the hub with image size, model, latency, output preview (first 300 chars).

2. Hub endpoints (services/targo-hub/lib/vision.js)

All three endpoints:

  • are POST with JSON body { image: <base64 or data URI> },
  • return structured JSON (see per-endpoint schemas below),
  • require AI_API_KEY in the hub environment,
  • are unauthenticated from the browser (rate-limiting is the hub's job).

POST /vision/barcodes

Extracts up to 3 identifiers (serials, MACs, GPON SNs, barcodes).

{
  "barcodes": ["1608K44D9E79FAFF5", "0418D6A1B2C3", "TPLG-A1B2C3D4"]
}

Used by: tech scan page, equipment link dialog, invoice scan (fallback).

POST /vision/equipment

Structured equipment-label parse (ONT/ONU/router/modem).

{
  "brand": "TP-Link",
  "model": "XX230v",
  "serial_number": "2234567890ABCD",
  "mac_address": "0418D6A1B2C3",
  "gpon_sn": "TPLGA1B2C3D4",
  "hw_version": "1.0",
  "equipment_type": "ont",
  "barcodes": ["..."]
}

Post-processing: mac_address stripped of separators + uppercased; serial_number trimmed of whitespace.

Used by: useEquipmentActions in the ops client detail page to pre-fill a "create Service Equipment" dialog.

POST /vision/invoice

Structured invoice/bill OCR. Canadian-tax-aware (GST/TPS + QST/TVQ).

{
  "vendor": "Acme Fibre Supplies",
  "vendor_address": "123 rue Somewhere, Montréal, QC",
  "invoice_number": "INV-2026-0042",
  "date": "2026-04-18",
  "due_date": "2026-05-18",
  "subtotal": 1000.00,
  "tax_gst": 50.00,
  "tax_qst": 99.75,
  "total": 1149.75,
  "currency": "CAD",
  "items": [
    { "description": "OLT SFP+ module", "qty": 4, "rate": 250.00, "amount": 1000.00 }
  ],
  "notes": "Payment terms: net 30"
}

Post-processing: string-shaped numbers (e.g. "1,234.56") are coerced to floats, both at the invoice level and per line item.

Used by: apps/ops/src/pages/OcrPage.vue (invoice intake), future supplier-bill wizard.


3. Frontend surface (apps/ops/src/api/ocr.js)

Thin wrapper over the hub. Same signatures for ops and field during the migration window (see apps/field/src/api/ocr.js — same file, different HUB_URL source).

Function Endpoint Error behavior
ocrBill(image) /vision/invoice Throws on non-2xx — caller shows Notify
scanBarcodes(image) /vision/barcodes Throws on non-2xx — useScanner catches + queues
scanEquipmentLabel(image) /vision/equipment Throws on non-2xx
checkOllamaStatus() /health Returns {online, models, hasVision}. Name kept for back-compat.

The checkOllamaStatus name is a leftover from the Ollama era — it now pings the hub's health endpoint and reports models: ['gemini-2.5-flash'] so existing callers (status chips, diagnostics panels) keep working. The name will be renamed to checkVisionStatus once no page references the old symbol.


4. Scanner composable (apps/ops/src/composables/useScanner.js)

Wraps the API with camera capture and resilience. Two modes on one composable:

Mode A — processPhoto(file) (barcodes, resilient)

  1. Resize the File twice:
    • 400px thumbnail for on-screen preview
    • 1600px @ q=0.92 for Gemini (text must stay readable)
  2. Race scanBarcodes(aiImage) against an 8s timeout (SCAN_TIMEOUT_MS).
  3. On timeout / network error, if the error is retryable (ScanTimeout | Failed to fetch | NetworkError | TypeError):
    • persist { id, image, ts, status: 'queued' } to IndexedDB via useOfflineStore.enqueueVisionScan,
    • flag photos[idx].queued = true for the UI chip,
    • show "Réseau faible — scan en attente. Reprise automatique au retour du signal."
  4. Otherwise, show the raw error.

On success, newly found codes are merged into barcodes.value (capped at MAX_BARCODES = 5, dedup by value), and the optional onNewCode(code) callback fires for each one.

Mode B — scanEquipmentLabel(file) (structured, synchronous)

No timeout, no queue. Returns the full Gemini response. Auto-merges any serial_number + barcodes[] into the same barcodes.value list so a page using both modes shares one visible list. Used in desktop/wifi flows where callers want a sync answer to pre-fill a form.

Late-delivered results

The composable runs a watch(() => offline.scanResults.length) so that when the offline store later completes a queued scan (tech walks out of the basement, signal returns), the codes appear in the UI as if they had come back synchronously. onNewCode fires for queued codes too, so lookup-and-notify side-effects happen regardless of path.

It also drains offline.scanResults once at mount, to catch the case where a scan completed while the page was unmounted (phone locked, app backgrounded, queue sync ran, user reopens ScanPage).


5. Offline store (apps/ops/src/stores/offline.js)

Pinia store, two queues, IndexedDB (idb-keyval):

Mutation queue

{ type: 'create'|'update', doctype, name?, data, ts, id } — ERPNext mutations. Flushed when window emits online. Failed items stay queued across reconnects. Keyed under offline-queue.

Vision queue

{ id, image (base64), ts, status } — photos whose Gemini call timed out or failed. Keyed under vision-queue.

Retries are time-driven, not event-driven. We don't trust navigator.onLine because it reports true on 2-bar LTE that can't actually reach msg.gigafibre.ca. First retry at 5s, back off to 30s on repeated failure. A reconnect (online event) also triggers an opportunistic immediate sync.

Successful scans land in scanResults (keyed vision-results) and the scanner composable consumes them via watcher + consumeScanResult(id) to avoid duplicates.

Generic cache

cacheData(key, data) / getCached(key) — plain read cache used by list pages for offline browsing. Keyed under cache-{key}.


6. Data flow example (tech scans an ONT in a basement)

[1] Tech taps "Scan" in /j/ScanPage          (camera opens)
[2] Tech takes photo                         (File → input.change)
[3] useScanner.processPhoto(file)
      → resizeImage(file, 400)               (thumbnail shown immediately)
      → resizeImage(file, 1600, 0.92)
      → Promise.race([scanBarcodes(ai), timeout(8s)])

CASE A — signal ok:
[4a] Gemini responds in 2s → barcodes[] merged → onNewCode fires
      → ERPNext lookup → Notify "ONT lié au client Untel"

CASE B — weak signal / timeout:
[4b] 8s timeout fires → isRetryable('ScanTimeout') → true
      → offline.enqueueVisionScan({ image: aiImage })
      → photos[idx].queued = true         (chip "scan en attente")
      → tech keeps scanning next device
[5b] Tech walks out of basement — window.online fires
      → syncVisionQueue() retries the queued photo
      → Gemini responds → scanResults.push({id, barcodes, ts})
[6b] useScanner watcher on scanResults.length fires
      → mergeCodes(barcodes, 'queued') → onNewCode fires (late)
      → Notify arrives while tech is walking back to the truck
      → consumeScanResult(id)             (removed from persistent queue)

7. Changes from the previous (Ollama) pipeline

Aspect Before (Phase 2) After (Phase 2.5)
Invoice OCR Ollama llama3.2-vision:11b on the serving VM Gemini 2.5 Flash via /vision/invoice
Barcode scan Hub /vision/barcodes (already Gemini) Unchanged
Equipment label Hub /vision/equipment (already Gemini) Unchanged
GPU requirement Yes (11GB VRAM for vision model) None — all inference remote
Offline resilience Only barcode mode, only in apps/field Now in apps/ops too (ready for /j)
Schema validation Hand-parsed from prompt-constrained JSON Gemini responseSchema enforces shape
Frontend import path 'src/api/ocr' (both apps) Unchanged — same symbols

8. Where to look next

  • Hub implementation: services/targo-hub/lib/vision.js, services/targo-hub/server.js (routes: /vision/barcodes, /vision/equipment, /vision/invoice).
  • Frontend API client: apps/ops/src/api/ocr.js (+ apps/field/src/api/ocr.js kept in sync during migration).
  • Scanner composable: apps/ops/src/composables/useScanner.js.
  • Offline store: apps/ops/src/stores/offline.js.

8.1 Secrets, keys and rotation

The only secret this pipeline needs is the Gemini API key. Everything else (models, base URL, hub public URL) is non-sensitive config.

Variable Where it's read Default Notes
AI_API_KEY services/targo-hub/lib/config.js:38 (none — required) Google AI Studio key for generativelanguage.googleapis.com. Server-side only, never reaches the browser bundle.
AI_MODEL config.js:39 gemini-2.5-flash Primary vision model.
AI_FALLBACK_MODEL config.js:40 gemini-2.5-flash-lite-preview Used by text-only calls (not vision) when primary rate-limits.
AI_BASE_URL config.js:41 https://generativelanguage.googleapis.com/v1beta/openai/ OpenAI-compatible endpoint used by agent code. Vision bypasses this and talks to the native /v1beta/models/...:generateContent URL.

Storage policy. The repo is private and follows the same posture as the ERPNext service token already hardcoded in apps/ops/infra/nginx.conf:15 and apps/field/infra/nginx.conf:13. The Gemini key can live in any of three places, in increasing order of "checked into git":

  1. Prod VM env only (status quo): key is in the environment: block of the targo-hub service in /opt/targo-hub/docker-compose.yml on 96.125.196.67. config.js:38 reads it via process.env.AI_API_KEY. Rotation = edit that one line + docker compose restart targo-hub.
  2. In-repo fallback in config.js: change line 38 to AI_API_KEY: env('AI_API_KEY', 'AIzaSy...') — the env var still wins when set, so prod doesn't break, but a fresh clone Just Works. Same pattern as nginx's ERPNext token.
  3. Hardcoded constant (not recommended): replace env(...) entirely. Loses the ability to override per environment (dev, staging).

If/when option 2 is chosen, the literal value should also be recorded in MEMORY.md (reference_google_ai.md) so that's the rotation source of truth — not scattered across the codebase.

Browser exposure. Zero. The ops nginx config proxies /hub/* to targo-hub on an internal Docker network; the hub injects the key before calling Google. apps/ops/src/api/ocr.js just does fetch('/hub/vision/barcodes', ...) — no key in the bundle, no key in DevTools, no key in the browser's Network tab.



10. Data-model relationships triggered by a scan

A scan is never just "identify a barcode." Every successful lookup fans out into the ERPNext graph: the scanned Service Equipment is the entry point, and the tech page (/j/device/:serial) surfaces everything tied to the same Customer and Service Location. This section documents that graph, the exact fields read per entity, and the write rules.

10.1 Graph (Service Equipment is the anchor)

                          ┌─────────────────────────┐
                          │  Service Equipment      │
                          │  EQP-#####              │◀───── scanned serial / MAC / barcode
                          │                         │       (3-tier lookup in TechScanPage)
                          │  • serial_number        │
                          │  • mac_address          │
                          │  • barcode              │
                          │  • equipment_type (ONT) │
                          │  • brand / model        │
                          │  • firmware / hw_version│
                          │  • status               │
                          │                         │
                          │  FK → customer ─────────┼───┐
                          │  FK → service_location ─┼─┐ │
                          │  FK → olt / port        │ │ │   (ONT-specific, TR-069 bind)
                          └─────────────────────────┘ │ │
                                                      │ │
                    ┌─────────────────────────────────┘ │
                    │                                   │
                    ▼                                   ▼
            ┌───────────────────┐             ┌───────────────────┐
            │ Service Location  │             │ Customer          │
            │ LOC-#####         │             │ CUST-#####        │
            │ • address         │             │ • customer_name   │
            │ • city            │             │ • stripe_id       │
            │ • postal_code     │             │ • ppa_enabled     │
            │ • connection_type │             │ • legacy_*_id     │
            │ • olt_port        │             └────────┬──────────┘
            │ • gps lat/lng     │                      │
            └───┬──────────┬────┘                      │
                │          │                           │
         inbound│          │inbound             inbound│
                │          │                           │
                ▼          ▼                           ▼
        ┌────────────┐ ┌──────────────┐        ┌──────────────┐
        │ Issue      │ │ Dispatch Job │        │ Subscription │
        │ TCK-#####  │ │ DJ-#####     │        │ SUB-#####    │
        │            │ │              │        │              │
        │ open       │ │ upcoming     │        │ active plan  │
        │ tickets    │ │ installs /   │        │ billing      │
        │            │ │ repairs      │        │ RADIUS creds │
        └────────────┘ └──────────────┘        └──────────────┘
              FK: service_location       FK: party_type='Customer', party=<cust>

Two FK axes, not one. Tickets + Dispatch Jobs pivot on where the problem is (Service Location). Subscriptions pivot on who owns the account (Customer). A customer can have multiple locations (duplex, rental, commercial); the scan page shows the subscription freshest for the customer, even if the scanned device is at a secondary address.

10.2 Exact reads issued from TechDevicePage.vue

Step Call Filter Fields read Purpose
1 listDocs('Service Equipment') serial_number = :serial name Exact-serial lookup
1 listDocs('Service Equipment') barcode = :serial name Fallback if serial missed
2 getDoc('Service Equipment', name) full doc Device card: brand/model/MAC/firmware/customer/service_location/olt_*
3 getDoc('Service Location', loc) full doc Address, GPS, connection_type, olt_port
4 listDocs('Subscription') party_type='Customer', party=<cust>, status='Active' name, status, start_date, current_invoice_end Active plan chip
5 listDocs('Issue') service_location=<loc>, status ∈ {Open, In Progress, On Hold} name, subject, status, priority, opening_date Open tickets list
6 listDocs('Dispatch Job') service_location=<loc>, status ∈ {Planned, Scheduled, En Route, In Progress} name, subject, job_type, status, scheduled_date, technician Upcoming interventions

All five fan-out queries run in parallel via Promise.allSettled, so a permission error on any single doctype (e.g. tech role can't read Subscription in some envs) doesn't block the page render — just that card is omitted.

10.3 Writes issued from TechScanPage.vue

The scan page writes to exactly one doctypeService Equipment — never to Customer, Location, Subscription, Issue, or Dispatch Job. All relationship changes happen via FK updates on the equipment row:

Trigger Write Why
Auto-link from job context updateDoc('Service Equipment', name, { customer, service_location }) Tech opened Scan from a Dispatch Job (?job=&customer=&location=) and the scanned equipment has no location yet — this "claims" the device for the install.
Manual link dialog updateDoc('Service Equipment', name, { customer, service_location }) Tech searched customer + picked one of the customer's locations.
Create new device createDoc('Service Equipment', data) 3-tier lookup came up empty — create a stub and tie it to the current job if available.
Customer re-link (from TechDevicePage) updateDoc('Service Equipment', name, { customer }) Tech realized the device is at the wrong account; re-linking the customer auto-reloads the subscription card.

Subscription / Issue / Dispatch Job are read-only in the scan flow. The tech app never creates a ticket from a scan — that's the job of the ops dispatcher in DispatchPage.vue + ClientDetailPage.vue. The scan page's contribution is to make the FK (service_location on the equipment) accurate so those downstream cards light up correctly when the dispatcher or the next tech opens the page.

When TechScanPage is opened from a Dispatch Job (goScan on TechJobDetailPage propagates ?job=<name>&customer=<id>&location=<loc>), each successful lookup runs:

if (result.found && jobContext.customer && !result.equipment.service_location) {
  await updateDoc('Service Equipment', result.equipment.name, {
    customer: jobContext.customer,
    service_location: jobContext.location,   // only if the job has one
  })
}

Why gated on "no existing service_location": a device that's already tied to address A should never silently move to address B just because a tech scanned it on a job ticket. If the location is already set, the tech has to use the "Re-link" action in TechDevicePage, which is explicit and logged. This prevents swap-out scenarios (tech brings a tested spare ONT from another install and scans it to confirm serial) from corrupting address ownership.

10.5 Why this matters for offline mode

The offline store (stores/offline.js) queues updateDoc calls under the mutation queue, not the vision queue. That means:

  • Scan photo → offline → vision-queue → retries against Gemini when signal returns.
  • Auto-link / create-equipment → offline → offline-queue → retries against ERPNext when signal returns.

Because both queues drain time-driven, a tech who scans 6 ONTs in a no-signal basement comes back to the truck and the phone silently:

  1. Sends the 6 photos to Gemini (vision queue)
  2. Receives the 6 barcode lists
  3. Fans each one through lookupInERPNext (the scan page watcher)
  4. For found + unlinked devices, enqueues 6 updateDoc calls
  5. Drains the mutation queue → all 6 devices now carry customer + service_location FKs
  6. Next time dispatcher opens the Dispatch Job, all 6 equipment rows appear in the equipment list (via reverse FK query from the job page)

The FK write on Service Equipment is what "connects" the scan to every downstream card (ticket list, subscription chip, dispatch job list). Everything else is a read on those FKs.