Stack 1B / Linux containers + UE5 (local-first)¶
Status: Dev/QA opportunistic, no SLA, started on demand. Role: local-first pilot, product validation without cloud dependency.
Host¶
| Item | Value |
|---|---|
| Machine | rancher2 (WireGuard jumpbox) |
| IP | 100.100.100.249 |
| GPU | NVIDIA RTX 4060 Ti 16 GB VRAM (Ada Lovelace AD106, PCI 10de:2805) |
| Driver | nvidia 580.105.08 |
| OS | Linux + Podman/Docker |
| Note | .249 also serves as WireGuard jumpbox (dual role) |
Application components (L2 containers)¶
Unreal Engine 5
MetaHuman + Pixel Streaming] Signal[u5-signalling
WebRTC signaling server] Orchestrator[u5-orchestrator
state machine + pipeline] Speaches[u5-speaches
STT faster-whisper] TTS[u5-kokoro
local TTS] end subgraph External["Adjacent services"] Ollama[Ollama
100.100.100.179:11434
Qwen 2.5:14b] Qdrant[Qdrant
qdrant.cledorze.lan
RAG vector DB] OFF[Open Food Facts
public API] end Browser <-->|WebRTC| Signal Signal -->|signaling| Avatar Avatar -->|pixel stream| Browser Browser -->|audio mic| Orchestrator Orchestrator --> Speaches Speaches -->|text| Orchestrator Orchestrator -->|prompt| Ollama Ollama -->|response| Orchestrator Orchestrator -->|context lookup| Qdrant Orchestrator -->|product fiche| OFF Orchestrator -->|text| TTS TTS -->|audio + viseme| Avatar classDef hero fill:#166534,stroke:#22c55e,stroke-width:3px,color:#fff classDef service fill:#0f766e,stroke:#5eead4,stroke-width:2px,color:#fff classDef external fill:#1e293b,stroke:#475569,stroke-width:2px,color:#e2e8f0 class Avatar,Orchestrator hero class Signal,Speaches,TTS service class Ollama,Qdrant,OFF external
Detailed design references¶
The full Stack B detailed design lives in ../../stack-b/ (8 documents migrated from the original stack-b-design repo):
01-virbe-architecture.md/ reference of the Virbe pipeline we're modeling against02-gap-analysis.md/ Stack B vs Virbe feature delta03-conversation-pipeline.md/ orchestrator pipeline detail04-knowledge-base.md/ RAG / Qdrant design05-scalability-model.md/ per-kiosk to N-kiosks scaling06-ui-ux-specs.md/ interface specs07-tts-stt-integration.md/ STT/TTS plumbing08-deployment-model.md/ Helm / Fleet deployment model
These docs are the most detailed source of truth for Stack 1B today. This README is the C4-L2 summary above; refer to ../../stack-b/ for the L3 details.
Per-site production topology (Phase 1B)¶
The dev/QA host above (rancher2 .249) runs the full stack on a single box. In a
store, the same stack becomes the master, with the screens as thin workers
(ADR-009):
- Master node — GPU 16 GB VRAM, full local AI pipeline (LLM + RAG + STT + TTS + NeuroSync lip-sync + UE5), platform stack RKE2 · Fleet · Harbor · Prometheus · Grafana, 3 avatars max.
- Workers (1 to 4) — no GPU: screen, directional mic + speakers, presence sensor, avatar received as a WebRTC stream from the master.
- Site infra — cabled Ethernet LAN, managed switch, UPS, patch panel, managed 5G router. Full-local at runtime; WAN only for planned maintenance.
Beyond ~30 kiosks the fleet moves to a central mini-HPC (ADR-011).
Strengths¶
- ✅ Local-first: no captive cloud, predictable latency
- ✅ GDPR-clean by design: no audio/video frame leaves the kiosk
- ✅ Open stack: UE5 + Ollama + Qdrant all free/open source
- ✅ UE5 Pixel Streaming = photorealistic avatar without Virbe
- ✅ Marginal cost per session = 0 (no API billing)
- ✅ Direction aligned with SUSE AI (future Stack 2 could leverage it)
Limitations¶
- ⚠️ Lip-sync: now handled by NeuroSync (ADR-010), replacing Audio2Face — per-persona tuning still ongoing
- ⚠️ End-to-end latency: budgeted at 1.0-1.8 s (conversation pipeline), down from the old "< 3 s" target
- ❌ High GPU load: UE5 + generative pipeline = possible 16 GB VRAM saturation (hence 3 avatars max per master)
- ❌ More complex to operate: 5 containers to coordinate vs 1 Virbe app
- ❌ No network fallback if a container crashes
When to use Stack 1B¶
- "Sovereign Praesenz" demo for customers sensitive to GDPR / sovereignty
- Testing the PIM-native path (Open Food Facts connector, later Salsify)
- Iterating on the conversational orchestrator
- Validating scale hypotheses (GPU, RAM, network footprint)
Links to detailed L3¶
components-presence.mmd/ PIR/ultrasound/camera detection (to produce when presence is coded)components-avatar.mmd/ UE5 Pixel Streaming pipeline (to produce)components-knowledge.mmd/ Qdrant RAG + OFF (to produce)data-flow.mmd/ scan QR → WhatsApp → Léa flow (to produce)
Evolution¶
Stack 1B is actively in development. Playground for: 1. The Open Food Facts connector 2. Presence detection (PIR + ultrasonic + MediaPipe camera) 3. UE5 + NeuroSync lip-sync (ADR-010) 4. The Salsify SupplierXM + Akeneo connectors (V2) 5. The Equadis / GS1 GDSN + retailer SFTP connectors (V3)
To clarify¶
- state.json on uterrie references
u5-*containers that never ran there. Cleanup to propose (see open question architecture). - WireGuard isolation: since .249 is the WG jumpbox, Stack 1B containers should be isolated from third-party VPN traffic. Dedicated network namespaces.
- GPU quotas: if Stack 1B and other workloads share the 4060 Ti, plan MIG or limits.
ADRs referencing this stack¶
- 001 / Two parallel stacks
- 002 / Local-first architecture
- 004 / Stack 1B local-first as Stack 2 prod target
- 005 / PIM-native architecture
- 006 / 3-tier data supply (superseded by 013)
- 008 / Picard pilot (extended by 012)
- 009 / Per-site master + thin workers
- 010 / NeuroSync lip-sync
- 011 / Two-phase scale model
- 012 / Multi-enseigne go-to-market
- 013 / PIM source tiers (extended)