Skip to content

Stack 1B / Linux containers + UE5 (local-first)

Status: Dev/QA opportunistic, no SLA, started on demand. Role: local-first pilot, product validation without cloud dependency.

Host

Item Value
Machine rancher2 (WireGuard jumpbox)
IP 100.100.100.249
GPU NVIDIA RTX 4060 Ti 16 GB VRAM (Ada Lovelace AD106, PCI 10de:2805)
Driver nvidia 580.105.08
OS Linux + Podman/Docker
Note .249 also serves as WireGuard jumpbox (dual role)

Application components (L2 containers)

graph TB Customer[Kiosk customer] -->|WebRTC stream| Browser[Kiosk browser] subgraph Stack1B["Stack 1B on rancher2 (.249) / RTX 4060 Ti 16 GB"] Avatar[u5-avatar
Unreal Engine 5
MetaHuman + Pixel Streaming] Signal[u5-signalling
WebRTC signaling server] Orchestrator[u5-orchestrator
state machine + pipeline] Speaches[u5-speaches
STT faster-whisper] TTS[u5-kokoro
local TTS] end subgraph External["Adjacent services"] Ollama[Ollama
100.100.100.179:11434
Qwen 2.5:14b] Qdrant[Qdrant
qdrant.cledorze.lan
RAG vector DB] OFF[Open Food Facts
public API] end Browser <-->|WebRTC| Signal Signal -->|signaling| Avatar Avatar -->|pixel stream| Browser Browser -->|audio mic| Orchestrator Orchestrator --> Speaches Speaches -->|text| Orchestrator Orchestrator -->|prompt| Ollama Ollama -->|response| Orchestrator Orchestrator -->|context lookup| Qdrant Orchestrator -->|product fiche| OFF Orchestrator -->|text| TTS TTS -->|audio + viseme| Avatar classDef hero fill:#166534,stroke:#22c55e,stroke-width:3px,color:#fff classDef service fill:#0f766e,stroke:#5eead4,stroke-width:2px,color:#fff classDef external fill:#1e293b,stroke:#475569,stroke-width:2px,color:#e2e8f0 class Avatar,Orchestrator hero class Signal,Speaches,TTS service class Ollama,Qdrant,OFF external

Detailed design references

The full Stack B detailed design lives in ../../stack-b/ (8 documents migrated from the original stack-b-design repo):

  • 01-virbe-architecture.md / reference of the Virbe pipeline we're modeling against
  • 02-gap-analysis.md / Stack B vs Virbe feature delta
  • 03-conversation-pipeline.md / orchestrator pipeline detail
  • 04-knowledge-base.md / RAG / Qdrant design
  • 05-scalability-model.md / per-kiosk to N-kiosks scaling
  • 06-ui-ux-specs.md / interface specs
  • 07-tts-stt-integration.md / STT/TTS plumbing
  • 08-deployment-model.md / Helm / Fleet deployment model

These docs are the most detailed source of truth for Stack 1B today. This README is the C4-L2 summary above; refer to ../../stack-b/ for the L3 details.

Per-site production topology (Phase 1B)

The dev/QA host above (rancher2 .249) runs the full stack on a single box. In a store, the same stack becomes the master, with the screens as thin workers (ADR-009):

  • Master node — GPU 16 GB VRAM, full local AI pipeline (LLM + RAG + STT + TTS + NeuroSync lip-sync + UE5), platform stack RKE2 · Fleet · Harbor · Prometheus · Grafana, 3 avatars max.
  • Workers (1 to 4)no GPU: screen, directional mic + speakers, presence sensor, avatar received as a WebRTC stream from the master.
  • Site infra — cabled Ethernet LAN, managed switch, UPS, patch panel, managed 5G router. Full-local at runtime; WAN only for planned maintenance.

Beyond ~30 kiosks the fleet moves to a central mini-HPC (ADR-011).

Strengths

  • Local-first: no captive cloud, predictable latency
  • GDPR-clean by design: no audio/video frame leaves the kiosk
  • Open stack: UE5 + Ollama + Qdrant all free/open source
  • ✅ UE5 Pixel Streaming = photorealistic avatar without Virbe
  • Marginal cost per session = 0 (no API billing)
  • ✅ Direction aligned with SUSE AI (future Stack 2 could leverage it)

Limitations

  • ⚠️ Lip-sync: now handled by NeuroSync (ADR-010), replacing Audio2Face — per-persona tuning still ongoing
  • ⚠️ End-to-end latency: budgeted at 1.0-1.8 s (conversation pipeline), down from the old "< 3 s" target
  • High GPU load: UE5 + generative pipeline = possible 16 GB VRAM saturation (hence 3 avatars max per master)
  • ❌ More complex to operate: 5 containers to coordinate vs 1 Virbe app
  • ❌ No network fallback if a container crashes

When to use Stack 1B

  • "Sovereign Praesenz" demo for customers sensitive to GDPR / sovereignty
  • Testing the PIM-native path (Open Food Facts connector, later Salsify)
  • Iterating on the conversational orchestrator
  • Validating scale hypotheses (GPU, RAM, network footprint)
  • components-presence.mmd / PIR/ultrasound/camera detection (to produce when presence is coded)
  • components-avatar.mmd / UE5 Pixel Streaming pipeline (to produce)
  • components-knowledge.mmd / Qdrant RAG + OFF (to produce)
  • data-flow.mmd / scan QR → WhatsApp → Léa flow (to produce)

Evolution

Stack 1B is actively in development. Playground for: 1. The Open Food Facts connector 2. Presence detection (PIR + ultrasonic + MediaPipe camera) 3. UE5 + NeuroSync lip-sync (ADR-010) 4. The Salsify SupplierXM + Akeneo connectors (V2) 5. The Equadis / GS1 GDSN + retailer SFTP connectors (V3)

To clarify

  • state.json on uterrie references u5-* containers that never ran there. Cleanup to propose (see open question architecture).
  • WireGuard isolation: since .249 is the WG jumpbox, Stack 1B containers should be isolated from third-party VPN traffic. Dedicated network namespaces.
  • GPU quotas: if Stack 1B and other workloads share the 4060 Ti, plan MIG or limits.

ADRs referencing this stack

  • 001 / Two parallel stacks
  • 002 / Local-first architecture
  • 004 / Stack 1B local-first as Stack 2 prod target
  • 005 / PIM-native architecture
  • 006 / 3-tier data supply (superseded by 013)
  • 008 / Picard pilot (extended by 012)
  • 009 / Per-site master + thin workers
  • 010 / NeuroSync lip-sync
  • 011 / Two-phase scale model
  • 012 / Multi-enseigne go-to-market
  • 013 / PIM source tiers (extended)