# Vision & OCR Pipeline > **All vision runs on Gemini 2.5 Flash via `targo-hub`.** No local Ollama. The > ops/ERPNext VM has no GPU, so every vision request — bills, barcodes, > equipment labels — goes to Google's Gemini API from a single backend > service and gets normalized before hitting the frontend. **Last refreshed:** 2026-04-22 (cutover from Ollama → Gemini) --- ## 1. Architecture at a glance ```text ┌──────────────────┐ ┌───────────────────────┐ │ apps/ops (PWA) │ │ apps/field (PWA) │ │ /ops/* │ │ /field/* (retiring) │ └────────┬─────────┘ └──────────┬────────────┘ │ │ │ src/api/ocr.js │ src/api/ocr.js │ {ocrBill, scanBarcodes, │ {ocrBill, scanBarcodes, │ scanEquipmentLabel} │ checkOllamaStatus} │ │ └──────────────┬──────────────┘ │ POST https://msg.gigafibre.ca/vision/* ▼ ┌───────────────────────┐ │ targo-hub │ │ lib/vision.js │ │ ├─ /vision/barcodes │ │ ├─ /vision/equipment│ │ └─ /vision/invoice │ └──────────┬────────────┘ │ generativelanguage.googleapis.com ▼ ┌───────────────────────┐ │ Gemini 2.5 Flash │ │ (text + image, JSON │ │ responseSchema) │ └───────────────────────┘ ``` **Why route everything through the hub:** 1. **No GPU on ops VM.** The only machine with a local Ollama was retired in Phase 2.5. Centralizing on Gemini means the frontend stops caring where inference happens. 2. **Single AI_API_KEY rotation surface.** Key lives in the hub env only. 3. **Schema guarantees.** Gemini supports `responseSchema` in the v1beta API — the hub enforces it per endpoint, so the frontend can trust the JSON shape without defensive parsing. 4. **Observability.** Every call is logged in the hub with image size, model, latency, output preview (first 300 chars). --- ## 2. Hub endpoints (`services/targo-hub/lib/vision.js`) All three endpoints: - are `POST` with JSON body `{ image: }`, - return structured JSON (see per-endpoint schemas below), - require `AI_API_KEY` in the hub environment, - are unauthenticated from the browser (rate-limiting is the hub's job). ### `POST /vision/barcodes` Extracts up to 3 identifiers (serials, MACs, GPON SNs, barcodes). ```json { "barcodes": ["1608K44D9E79FAFF5", "0418D6A1B2C3", "TPLG-A1B2C3D4"] } ``` Used by: tech scan page, equipment link dialog, invoice scan (fallback). ### `POST /vision/equipment` Structured equipment-label parse (ONT/ONU/router/modem). ```json { "brand": "TP-Link", "model": "XX230v", "serial_number": "2234567890ABCD", "mac_address": "0418D6A1B2C3", "gpon_sn": "TPLGA1B2C3D4", "hw_version": "1.0", "equipment_type": "ont", "barcodes": ["..."] } ``` Post-processing: `mac_address` stripped of separators + uppercased; `serial_number` trimmed of whitespace. Used by: `useEquipmentActions` in the ops client detail page to pre-fill a "create Service Equipment" dialog. ### `POST /vision/invoice` Structured invoice/bill OCR. Canadian-tax-aware (GST/TPS + QST/TVQ). ```json { "vendor": "Acme Fibre Supplies", "vendor_address": "123 rue Somewhere, Montréal, QC", "invoice_number": "INV-2026-0042", "date": "2026-04-18", "due_date": "2026-05-18", "subtotal": 1000.00, "tax_gst": 50.00, "tax_qst": 99.75, "total": 1149.75, "currency": "CAD", "items": [ { "description": "OLT SFP+ module", "qty": 4, "rate": 250.00, "amount": 1000.00 } ], "notes": "Payment terms: net 30" } ``` Post-processing: string-shaped numbers (e.g. `"1,234.56"`) are coerced to floats, both at the invoice level and per line item. Used by: `apps/ops/src/pages/OcrPage.vue` (invoice intake), future supplier-bill wizard. --- ## 3. Frontend surface (`apps/ops/src/api/ocr.js`) Thin wrapper over the hub. Same signatures for ops and field during the migration window (see `apps/field/src/api/ocr.js` — same file, different HUB_URL source). | Function | Endpoint | Error behavior | |---|---|---| | `ocrBill(image)` | `/vision/invoice` | Throws on non-2xx — caller shows Notify | | `scanBarcodes(image)` | `/vision/barcodes` | Throws on non-2xx — **`useScanner` catches + queues** | | `scanEquipmentLabel(image)` | `/vision/equipment` | Throws on non-2xx | | `checkOllamaStatus()` | `/health` | Returns `{online, models, hasVision}`. Name kept for back-compat. | The `checkOllamaStatus` name is a leftover from the Ollama era — it now pings the hub's health endpoint and reports `models: ['gemini-2.5-flash']` so existing callers (status chips, diagnostics panels) keep working. The name will be renamed to `checkVisionStatus` once no page references the old symbol. --- ## 4. Scanner composable (`apps/ops/src/composables/useScanner.js`) Wraps the API with camera capture and resilience. Two modes on one composable: ### Mode A — `processPhoto(file)` (barcodes, resilient) 1. Resize the `File` twice: - 400px thumbnail for on-screen preview - 1600px @ q=0.92 for Gemini (text must stay readable) 2. Race `scanBarcodes(aiImage)` against an **8s timeout** (`SCAN_TIMEOUT_MS`). 3. On timeout / network error, if the error is retryable (ScanTimeout | Failed to fetch | NetworkError | TypeError): - persist `{ id, image, ts, status: 'queued' }` to IndexedDB via `useOfflineStore.enqueueVisionScan`, - flag `photos[idx].queued = true` for the UI chip, - show "Réseau faible — scan en attente. Reprise automatique au retour du signal." 4. Otherwise, show the raw error. On success, newly found codes are merged into `barcodes.value` (capped at `MAX_BARCODES = 5`, dedup by value), and the optional `onNewCode(code)` callback fires for each one. ### Mode B — `scanEquipmentLabel(file)` (structured, synchronous) No timeout, no queue. Returns the full Gemini response. Auto-merges any `serial_number` + `barcodes[]` into the same `barcodes.value` list so a page using both modes shares one visible list. Used in desktop/wifi flows where callers want a sync answer to pre-fill a form. ### Late-delivered results The composable runs a `watch(() => offline.scanResults.length)` so that when the offline store later completes a queued scan (tech walks out of the basement, signal returns), the codes appear in the UI *as if* they had come back synchronously. `onNewCode` fires for queued codes too, so lookup-and-notify side-effects happen regardless of path. It also drains `offline.scanResults` once at mount, to catch the case where a scan completed while the page was unmounted (phone locked, app backgrounded, queue sync ran, user reopens ScanPage). --- ## 5. Offline store (`apps/ops/src/stores/offline.js`) Pinia store, two queues, IndexedDB (`idb-keyval`): ### Mutation queue `{ type: 'create'|'update', doctype, name?, data, ts, id }` — ERPNext mutations. Flushed when `window` emits `online`. Failed items stay queued across reconnects. Keyed under `offline-queue`. ### Vision queue `{ id, image (base64), ts, status }` — photos whose Gemini call timed out or failed. Keyed under `vision-queue`. **Retries are time-driven, not event-driven.** We don't trust `navigator.onLine` because it reports `true` on 2-bar LTE that can't actually reach msg.gigafibre.ca. First retry at 5s, back off to 30s on repeated failure. A reconnect (online event) also triggers an opportunistic immediate sync. Successful scans land in `scanResults` (keyed `vision-results`) and the scanner composable consumes them via watcher + `consumeScanResult(id)` to avoid duplicates. ### Generic cache `cacheData(key, data)` / `getCached(key)` — plain read cache used by list pages for offline browsing. Keyed under `cache-{key}`. --- ## 6. Data flow example (tech scans an ONT in a basement) ``` [1] Tech taps "Scan" in /j/ScanPage (camera opens) [2] Tech takes photo (File → input.change) [3] useScanner.processPhoto(file) → resizeImage(file, 400) (thumbnail shown immediately) → resizeImage(file, 1600, 0.92) → Promise.race([scanBarcodes(ai), timeout(8s)]) CASE A — signal ok: [4a] Gemini responds in 2s → barcodes[] merged → onNewCode fires → ERPNext lookup → Notify "ONT lié au client Untel" CASE B — weak signal / timeout: [4b] 8s timeout fires → isRetryable('ScanTimeout') → true → offline.enqueueVisionScan({ image: aiImage }) → photos[idx].queued = true (chip "scan en attente") → tech keeps scanning next device [5b] Tech walks out of basement — window.online fires → syncVisionQueue() retries the queued photo → Gemini responds → scanResults.push({id, barcodes, ts}) [6b] useScanner watcher on scanResults.length fires → mergeCodes(barcodes, 'queued') → onNewCode fires (late) → Notify arrives while tech is walking back to the truck → consumeScanResult(id) (removed from persistent queue) ``` --- ## 7. Changes from the previous (Ollama) pipeline | Aspect | Before (Phase 2) | After (Phase 2.5) | |---|---|---| | Invoice OCR | Ollama `llama3.2-vision:11b` on the serving VM | Gemini 2.5 Flash via `/vision/invoice` | | Barcode scan | Hub `/vision/barcodes` (already Gemini) | Unchanged | | Equipment label | Hub `/vision/equipment` (already Gemini) | Unchanged | | GPU requirement | Yes (11GB VRAM for vision model) | None — all inference remote | | Offline resilience | Only barcode mode, only in apps/field | Now in apps/ops too (ready for /j) | | Schema validation | Hand-parsed from prompt-constrained JSON | Gemini `responseSchema` enforces shape | | Frontend import path | `'src/api/ocr'` (both apps) | Unchanged — same symbols | --- ## 8. Where to look next - **Hub implementation:** `services/targo-hub/lib/vision.js`, `services/targo-hub/server.js` (routes: `/vision/barcodes`, `/vision/equipment`, `/vision/invoice`). - **Frontend API client:** `apps/ops/src/api/ocr.js` (+ `apps/field/src/api/ocr.js` kept in sync during migration). - **Scanner composable:** `apps/ops/src/composables/useScanner.js`. - **Offline store:** `apps/ops/src/stores/offline.js`. ### 8.1 Secrets, keys and rotation The only secret this pipeline needs is the Gemini API key. Everything else (models, base URL, hub public URL) is non-sensitive config. | Variable | Where it's read | Default | Notes | |---|---|---|---| | `AI_API_KEY` | `services/targo-hub/lib/config.js:38` | *(none — required)* | Google AI Studio key for `generativelanguage.googleapis.com`. **Server-side only**, never reaches the browser bundle. | | `AI_MODEL` | `config.js:39` | `gemini-2.5-flash` | Primary vision model. | | `AI_FALLBACK_MODEL` | `config.js:40` | `gemini-2.5-flash-lite-preview` | Used by text-only calls (not vision) when primary rate-limits. | | `AI_BASE_URL` | `config.js:41` | `https://generativelanguage.googleapis.com/v1beta/openai/` | OpenAI-compatible endpoint used by agent code. Vision bypasses this and talks to the native `/v1beta/models/...:generateContent` URL. | **Storage policy.** The repo is private and follows the same posture as the ERPNext service token already hardcoded in `apps/ops/infra/nginx.conf:15` and `apps/field/infra/nginx.conf:13`. The Gemini key can live in any of three places, in increasing order of "checked into git": 1. **Prod VM env only** (status quo): key is in the `environment:` block of the `targo-hub` service in `/opt/targo-hub/docker-compose.yml` on `96.125.196.67`. `config.js:38` reads it via `process.env.AI_API_KEY`. Rotation = edit that one line + `docker compose restart targo-hub`. 2. **In-repo fallback in `config.js`**: change line 38 to `AI_API_KEY: env('AI_API_KEY', 'AIzaSy...')` — the env var still wins when set, so prod doesn't break, but a fresh clone Just Works. Same pattern as nginx's ERPNext token. 3. **Hardcoded constant** (not recommended): replace `env(...)` entirely. Loses the ability to override per environment (dev, staging). If/when option 2 is chosen, the literal value should also be recorded in `MEMORY.md` (`reference_google_ai.md`) so that's the rotation source of truth — not scattered across the codebase. **Browser exposure.** Zero. The ops nginx config proxies `/hub/*` to targo-hub on an internal Docker network; the hub injects the key before calling Google. `apps/ops/src/api/ocr.js` just does `fetch('/hub/vision/barcodes', ...)` — no key in the bundle, no key in DevTools, no key in the browser's `Network` tab. --- ## 9. Related - [ARCHITECTURE.md](ARCHITECTURE.md) — the full service map this lives in. - [CPE_MANAGEMENT.md](CPE_MANAGEMENT.md) — how scanned serials flow into the TR-069/TR-369 device management plane. - [APP_DESIGN_GUIDELINES.md](APP_DESIGN_GUIDELINES.md) — frontend conventions (Vue 3 Composition API, feature folders). --- ## 10. Data-model relationships triggered by a scan A scan is never just "identify a barcode." Every successful lookup fans out into the ERPNext graph: the scanned Service Equipment is the entry point, and the tech page (`/j/device/:serial`) surfaces everything tied to the same Customer and Service Location. This section documents that graph, the exact fields read per entity, and the write rules. ### 10.1 Graph (Service Equipment is the anchor) ```text ┌─────────────────────────┐ │ Service Equipment │ │ EQP-##### │◀───── scanned serial / MAC / barcode │ │ (3-tier lookup in TechScanPage) │ • serial_number │ │ • mac_address │ │ • barcode │ │ • equipment_type (ONT) │ │ • brand / model │ │ • firmware / hw_version│ │ • status │ │ │ │ FK → customer ─────────┼───┐ │ FK → service_location ─┼─┐ │ │ FK → olt / port │ │ │ (ONT-specific, TR-069 bind) └─────────────────────────┘ │ │ │ │ ┌─────────────────────────────────┘ │ │ │ ▼ ▼ ┌───────────────────┐ ┌───────────────────┐ │ Service Location │ │ Customer │ │ LOC-##### │ │ CUST-##### │ │ • address │ │ • customer_name │ │ • city │ │ • stripe_id │ │ • postal_code │ │ • ppa_enabled │ │ • connection_type │ │ • legacy_*_id │ │ • olt_port │ └────────┬──────────┘ │ • gps lat/lng │ │ └───┬──────────┬────┘ │ │ │ │ inbound│ │inbound inbound│ │ │ │ ▼ ▼ ▼ ┌────────────┐ ┌──────────────┐ ┌──────────────┐ │ Issue │ │ Dispatch Job │ │ Subscription │ │ TCK-##### │ │ DJ-##### │ │ SUB-##### │ │ │ │ │ │ │ │ open │ │ upcoming │ │ active plan │ │ tickets │ │ installs / │ │ billing │ │ │ │ repairs │ │ RADIUS creds │ └────────────┘ └──────────────┘ └──────────────┘ FK: service_location FK: party_type='Customer', party= ``` **Two FK axes, not one.** Tickets + Dispatch Jobs pivot on *where* the problem is (Service Location). Subscriptions pivot on *who owns the account* (Customer). A customer can have multiple locations (duplex, rental, commercial); the scan page shows the subscription freshest for the customer, even if the scanned device is at a secondary address. ### 10.2 Exact reads issued from `TechDevicePage.vue` | Step | Call | Filter | Fields read | Purpose | |------|------|--------|-------------|---------| | 1 | `listDocs('Service Equipment')` | `serial_number = :serial` | `name` | Exact-serial lookup | | 1 | `listDocs('Service Equipment')` | `barcode = :serial` | `name` | Fallback if serial missed | | 2 | `getDoc('Service Equipment', name)` | — | full doc | Device card: brand/model/MAC/firmware/customer/service_location/olt_* | | 3 | `getDoc('Service Location', loc)` | — | full doc | Address, GPS, connection_type, olt_port | | 4 | `listDocs('Subscription')` | `party_type='Customer', party=, status='Active'` | `name, status, start_date, current_invoice_end` | Active plan chip | | 5 | `listDocs('Issue')` | `service_location=, status ∈ {Open, In Progress, On Hold}` | `name, subject, status, priority, opening_date` | Open tickets list | | 6 | `listDocs('Dispatch Job')` | `service_location=, status ∈ {Planned, Scheduled, En Route, In Progress}` | `name, subject, job_type, status, scheduled_date, technician` | Upcoming interventions | All five fan-out queries run in parallel via `Promise.allSettled`, so a permission error on any single doctype (e.g. tech role can't read `Subscription` in some envs) doesn't block the page render — just that card is omitted. ### 10.3 Writes issued from `TechScanPage.vue` The scan page writes to **exactly one doctype** — `Service Equipment` — never to Customer, Location, Subscription, Issue, or Dispatch Job. All relationship changes happen via FK updates on the equipment row: | Trigger | Write | Why | |---------|-------|-----| | Auto-link from job context | `updateDoc('Service Equipment', name, { customer, service_location })` | Tech opened Scan from a Dispatch Job (`?job=&customer=&location=`) and the scanned equipment has no location yet — this "claims" the device for the install. | | Manual link dialog | `updateDoc('Service Equipment', name, { customer, service_location })` | Tech searched customer + picked one of the customer's locations. | | Create new device | `createDoc('Service Equipment', data)` | 3-tier lookup came up empty — create a stub and tie it to the current job if available. | | Customer re-link (from TechDevicePage) | `updateDoc('Service Equipment', name, { customer })` | Tech realized the device is at the wrong account; re-linking the customer auto-reloads the subscription card. | **Subscription / Issue / Dispatch Job are read-only in the scan flow.** The tech app never creates a ticket from a scan — that's the job of the ops dispatcher in `DispatchPage.vue` + `ClientDetailPage.vue`. The scan page's contribution is to make the FK (`service_location` on the equipment) accurate so those downstream cards light up correctly when the dispatcher or the next tech opens the page. ### 10.4 Auto-link rule (the one piece of scan-time "business logic") When TechScanPage is opened from a Dispatch Job (`goScan` on TechJobDetailPage propagates `?job=&customer=&location=`), each successful lookup runs: ```js if (result.found && jobContext.customer && !result.equipment.service_location) { await updateDoc('Service Equipment', result.equipment.name, { customer: jobContext.customer, service_location: jobContext.location, // only if the job has one }) } ``` **Why gated on "no existing service_location":** a device that's already tied to address A should never silently move to address B just because a tech scanned it on a job ticket. If the location is already set, the tech has to use the "Re-link" action in TechDevicePage, which is explicit and logged. This prevents swap-out scenarios (tech brings a tested spare ONT from another install and scans it to confirm serial) from corrupting address ownership. ### 10.5 Why this matters for offline mode The offline store (`stores/offline.js`) queues `updateDoc` calls under the mutation queue, not the vision queue. That means: - **Scan photo** → offline → `vision-queue` → retries against Gemini when signal returns. - **Auto-link / create-equipment** → offline → `offline-queue` → retries against ERPNext when signal returns. Because both queues drain time-driven, a tech who scans 6 ONTs in a no-signal basement comes back to the truck and the phone silently: 1. Sends the 6 photos to Gemini (vision queue) 2. Receives the 6 barcode lists 3. Fans each one through `lookupInERPNext` (the scan page watcher) 4. For found + unlinked devices, enqueues 6 `updateDoc` calls 5. Drains the mutation queue → all 6 devices now carry `customer + service_location` FKs 6. Next time dispatcher opens the Dispatch Job, all 6 equipment rows appear in the equipment list (via reverse FK query from the job page) The FK write on Service Equipment is what "connects" the scan to every downstream card (ticket list, subscription chip, dispatch job list). Everything else is a read on those FKs. ---