Stack 1B / Linux containers + UE5 (local-first)¶

Status: Dev/QA opportunistic, no SLA, started on demand. Role: local-first pilot, product validation without cloud dependency.

Host¶

Item	Value
Machine	rancher2 (WireGuard jumpbox)
IP	100.100.100.249
GPU	NVIDIA RTX 4060 Ti 16 GB VRAM (Ada Lovelace AD106, PCI 10de:2805)
Driver	nvidia 580.105.08
OS	Linux + Podman/Docker
Note	.249 also serves as WireGuard jumpbox (dual role)

Application components (L2 containers)¶

graph TB Customer[Kiosk customer] -->|WebRTC stream| Browser[Kiosk browser] subgraph Stack1B["Stack 1B on rancher2 (.249) / RTX 4060 Ti 16 GB"] Avatar[u5-avatar
Unreal Engine 5
MetaHuman + Pixel Streaming] Signal[u5-signalling
WebRTC signaling server] Orchestrator[u5-orchestrator
state machine + pipeline] Speaches[u5-speaches
STT faster-whisper] TTS[u5-kokoro
local TTS] end subgraph External["Adjacent services"] Ollama[Ollama
100.100.100.179:11434
Qwen 2.5:14b] Qdrant[Qdrant
qdrant.cledorze.lan
RAG vector DB] OFF[Open Food Facts
public API] end Browser <-->|WebRTC| Signal Signal -->|signaling| Avatar Avatar -->|pixel stream| Browser Browser -->|audio mic| Orchestrator Orchestrator --> Speaches Speaches -->|text| Orchestrator Orchestrator -->|prompt| Ollama Ollama -->|response| Orchestrator Orchestrator -->|context lookup| Qdrant Orchestrator -->|product fiche| OFF Orchestrator -->|text| TTS TTS -->|audio + viseme| Avatar classDef hero fill:#166534,stroke:#22c55e,stroke-width:3px,color:#fff classDef service fill:#0f766e,stroke:#5eead4,stroke-width:2px,color:#fff classDef external fill:#1e293b,stroke:#475569,stroke-width:2px,color:#e2e8f0 class Avatar,Orchestrator hero class Signal,Speaches,TTS service class Ollama,Qdrant,OFF external

Detailed design references¶

The full Stack B detailed design lives in ../../stack-b/ (8 documents migrated from the original stack-b-design repo):

01-virbe-architecture.md / reference of the Virbe pipeline we're modeling against
02-gap-analysis.md / Stack B vs Virbe feature delta
03-conversation-pipeline.md / orchestrator pipeline detail
04-knowledge-base.md / RAG / Qdrant design
05-scalability-model.md / per-kiosk to N-kiosks scaling
06-ui-ux-specs.md / interface specs
07-tts-stt-integration.md / STT/TTS plumbing
08-deployment-model.md / Helm / Fleet deployment model

These docs are the most detailed source of truth for Stack 1B today. This README is the C4-L2 summary above; refer to ../../stack-b/ for the L3 details.

Per-site production topology (Phase 1B)¶

The dev/QA host above (rancher2 .249) runs the full stack on a single box. In a store, the same stack becomes the master, with the screens as thin workers (ADR-009):

Master node — GPU 16 GB VRAM, full local AI pipeline (LLM + RAG + STT + TTS + NeuroSync lip-sync + UE5), platform stack RKE2 · Fleet · Harbor · Prometheus · Grafana, 3 avatars max.
Workers (1 to 4) — no GPU: screen, directional mic + speakers, presence sensor, avatar received as a WebRTC stream from the master.
Site infra — cabled Ethernet LAN, managed switch, UPS, patch panel, managed 5G router. Full-local at runtime; WAN only for planned maintenance.

Beyond ~30 kiosks the fleet moves to a central mini-HPC (ADR-011).

Strengths¶

✅ Local-first: no captive cloud, predictable latency
✅ GDPR-clean by design: no audio/video frame leaves the kiosk
✅ Open stack: UE5 + Ollama + Qdrant all free/open source
✅ UE5 Pixel Streaming = photorealistic avatar without Virbe
✅ Marginal cost per session = 0 (no API billing)
✅ Direction aligned with SUSE AI (future Stack 2 could leverage it)

Limitations¶

⚠️ Lip-sync: now handled by NeuroSync (ADR-010), replacing Audio2Face — per-persona tuning still ongoing
⚠️ End-to-end latency: budgeted at 1.0-1.8 s (conversation pipeline), down from the old "< 3 s" target
❌ High GPU load: UE5 + generative pipeline = possible 16 GB VRAM saturation (hence 3 avatars max per master)
❌ More complex to operate: 5 containers to coordinate vs 1 Virbe app
❌ No network fallback if a container crashes

When to use Stack 1B¶

"Sovereign Praesenz" demo for customers sensitive to GDPR / sovereignty
Testing the PIM-native path (Open Food Facts connector, later Salsify)
Iterating on the conversational orchestrator
Validating scale hypotheses (GPU, RAM, network footprint)

Links to detailed L3¶

components-presence.mmd / PIR/ultrasound/camera detection (to produce when presence is coded)
components-avatar.mmd / UE5 Pixel Streaming pipeline (to produce)
components-knowledge.mmd / Qdrant RAG + OFF (to produce)
data-flow.mmd / scan QR → WhatsApp → Léa flow (to produce)

Evolution¶

Stack 1B is actively in development. Playground for: 1. The Open Food Facts connector 2. Presence detection (PIR + ultrasonic + MediaPipe camera) 3. UE5 + NeuroSync lip-sync (ADR-010) 4. The Salsify SupplierXM + Akeneo connectors (V2) 5. The Equadis / GS1 GDSN + retailer SFTP connectors (V3)

To clarify¶

state.json on uterrie references u5-* containers that never ran there. Cleanup to propose (see open question architecture).
WireGuard isolation: since .249 is the WG jumpbox, Stack 1B containers should be isolated from third-party VPN traffic. Dedicated network namespaces.
GPU quotas: if Stack 1B and other workloads share the 4060 Ti, plan MIG or limits.

ADRs referencing this stack¶

001 / Two parallel stacks
002 / Local-first architecture
004 / Stack 1B local-first as Stack 2 prod target
005 / PIM-native architecture
006 / 3-tier data supply (superseded by 013)
008 / Picard pilot (extended by 012)
009 / Per-site master + thin workers
010 / NeuroSync lip-sync
011 / Two-phase scale model
012 / Multi-enseigne go-to-market
013 / PIM source tiers (extended)