Add comprehensive README with architecture, gotchas, rebuild instructions

This commit is contained in:
Louis Paul 2026-03-26 13:29:59 +00:00
parent e97ba51a6c
commit b9225a0d23

88
README.md Normal file
View File

@ -0,0 +1,88 @@
# Gigafibre Infrastructure
## Architecture
```
Internet -> Traefik (80/443) -> Docker containers
|-- oss.gigafibre.ca -> Oktopus (CPE management TR-069/TR-369)
|-- git.gigafibre.ca -> Gitea
|-- dispatch.gigafibre.ca -> Dispatch App
|-- hub.gigafibre.ca -> Traefik Hub (management UI)
|-- traefik.gigafibre.ca -> Traefik Dashboard
```
## Quick Setup
```bash
git clone https://git.targo.ca/louis/gigafibre-infra.git /opt/infra
cd /opt/infra && bash setup.sh
```
## Services
| Service | Compose | Domain | Notes |
|---------|---------|--------|-------|
| Traefik v2.11 | traefik/ | traefik.gigafibre.ca | Reverse proxy + SSL |
| Traefik Hub | traefik-hub/ | hub.gigafibre.ca | Custom mgmt UI (DNS + routes + containers) |
| Oktopus CE | oktopus/ | oss.gigafibre.ca | ACS port 9292, MQTT port 1883 |
| Gitea | apps/ | git.gigafibre.ca | Container registry enabled |
| Dispatch | apps/ | dispatch.gigafibre.ca | Field technician scheduling |
| PostgreSQL x2 | apps/ | internal | targo-db + gitea-db |
## Network
- ens18: LAN 10.100.5.61/24 (VLAN 4000 native)
- ens19: WAN 96.125.196.67/26 (VLAN 4001)
- Default route: WAN metric 100, LAN metric 200
## Security
- UFW firewall: 22, 80, 443, 1883, 9292 only
- Fail2ban: SSH (3 attempts = 1h ban, systemd backend)
- Traefik Hub: session auth with HttpOnly cookies
- SSH: key-based auth (ed25519)
## DNS
Managed via OpenSRS API (user: targo, domain: gigafibre.ca).
Can be managed from Traefik Hub UI (DNS Records page).
## SSL
Let's Encrypt via Traefik HTTP-01 challenge.
**Do NOT enable global HTTP-to-HTTPS redirect** - it breaks the Let's Encrypt challenge.
Use per-router redirect middleware after certs are issued instead.
## Known Issues and Gotchas
1. **Traefik v3 vs Docker 29** - Traefik v3.x client uses API v1.24, Docker 29 minimum is v1.40. Stay on v2.11 until fixed.
2. **HTTP redirect breaks SSL** - Global entrypoint redirect prevents Let's Encrypt HTTP-01 validation.
3. **MongoDB needs AVX** - MongoDB 5+ requires AVX CPU. Proxmox CPU type must be "host", not emulated.
4. **netplan overrides** - Debian cloud images have netplan generating runtime configs that override static ones. Fix: remove netplan.io package.
5. **Multi-network routing** - Containers on multiple Docker networks need label `traefik.docker.network=proxy` so Traefik picks the correct IP.
6. **NATS JetStream** - Oktopus controller requires `--jetstream` flag on NATS.
7. **MQTT adapter config** - Needs explicit `MQTT_URL=tcp://mqtt:1883` environment variable.
8. **Adapter service** - Oktopus needs the `adapter` service (not just mqtt-adapter) for device queries via NATS.
9. **Frontend API URL** - Oktopus frontend `NEXT_PUBLIC_REST_ENDPOINT` must be empty so browser requests go through Traefik, not internal Docker DNS.
10. **Cloudflare tunnels** - Quick tunnels change URL on restart. Use systemd service for persistence, or better: direct SSH via public IP.
## Container Registry
Push custom images to git.targo.ca:
```bash
docker login git.targo.ca -u louis
docker tag myimage:latest git.targo.ca/louis/myimage:latest
docker push git.targo.ca/louis/myimage:latest
```
## Rebuild from scratch
```bash
# On a fresh Debian 12 VM (Proxmox CPU type: host)
apt-get update && apt-get install -y git
git clone https://git.targo.ca/louis/gigafibre-infra.git /opt/infra
cd /opt/infra && bash setup.sh
# Then configure DNS via hub.gigafibre.ca
```