Deployment Model¶
Current Stack B Deployment¶
Gitea (engineering-*) → Gitea Actions (uterrie) → Harbor → Fleet GitOps → node1 (RKE2)
- Build: Podman/Buildah on uterrie
- Registry: Harbor (harbor.cledorze.lan)
- Deploy: Fleet watches engineering-infra, deploys Helm charts to node1
- Target: node1 with praesenz.io/role=qa label + NoSchedule taint
Container Images¶
| Image | Registry Path | Source Repo |
|---|---|---|
| u5-orchestrator | harbor.cledorze.lan/library/u5-orchestrator | engineering-ai |
| u5-signalling | harbor.cledorze.lan/library/u5-signalling | engineering-frontend |
| u5-admin | harbor.cledorze.lan/library/u5-admin | engineering-frontend |
| u5-avatar | (external or local build) | engineering-frontend |
| speaches | harbor.cledorze.lan/library/speaches | engineering-ai |
Target Deployment (Per Store)¶
Kubernetes Resources¶
# Per kiosk instance
resources:
orchestrator:
requests: { cpu: "500m", memory: "512Mi" }
limits: { cpu: "1000m", memory: "1Gi" }
replicas: 1
gpu: false
signalling:
requests: { cpu: "250m", memory: "128Mi" }
limits: { cpu: "500m", memory: "256Mi" }
replicas: 1 # per kiosk
gpu: false
speaches:
requests: { cpu: "1000m", memory: "2Gi" }
limits: { cpu: "2000m", memory: "4Gi" }
replicas: 1
gpu: true # nvidia.com/gpu: 1
avatar:
requests: { cpu: "2000m", memory: "4Gi" }
limits: { cpu: "4000m", memory: "8Gi" }
replicas: 1 # per kiosk
gpu: true # nvidia.com/gpu: 1
admin:
requests: { cpu: "250m", memory: "256Mi" }
limits: { cpu: "500m", memory: "512Mi" }
replicas: 1
gpu: false
# Shared services (per store, not per kiosk)
ollama:
gpu: true
models: ["qwen2.5:14b", "nomic-embed-text"]
qdrant:
storage: 10Gi
gpu: false
Helm Chart Structure¶
stack-b/helm/
├── Chart.yaml
├── values.yaml
├── values-qa.yaml # node1 overrides
├── values-store1.yaml # store-specific overrides
├── fleet.yaml
└── templates/
├── orchestrator.yaml
├── signalling.yaml
├── speaches.yaml
├── avatar.yaml
├── admin.yaml
├── configmap-flows.yaml # NEW: conversation flow definitions
├── configmap-prompts.yaml # NEW: system prompts
├── configmap-kb.yaml # NEW: KB config (collections, Qdrant endpoint)
└── pvc-data.yaml
Configuration Management¶
Static config (Helm values): - Service endpoints, ports - GPU allocation - Resource limits - Store-specific settings (branding, language)
Dynamic config (Admin API → DB/ConfigMap): - Conversation flows (YAML) - System prompts - Knowledge base documents - Profile settings (timeouts, TTS voice)
GitOps Flow¶
Developer pushes code → Gitea Actions builds container
→ pushes to Harbor
→ updates image tag in engineering-infra values.yaml
Fleet detects change → deploys to node1
→ rolling update (zero downtime)
Zero-Downtime Updates¶
For kiosk service: 1. New pod starts alongside old pod 2. New pod passes health check 3. WebSocket connections on old pod drain (30s grace) 4. Old pod terminates
For conversation state: - Active conversations stored in Redis/memory - On restart: conversation state lost (acceptable — kiosk conversations are short) - Future: persist conversation state in Redis for seamless handover
Monitoring¶
Health Checks¶
GET /health → { "status": "ok", "gpu": true, "model_loaded": true }
GET /ready → { "ready": true, "qdrant": "connected", "ollama": "connected" }
Metrics (Prometheus)¶
kiosk_conversations_total(counter)kiosk_response_time_seconds(histogram)kiosk_stt_confidence(histogram)kiosk_rag_chunks_retrieved(gauge)kiosk_tts_generation_seconds(histogram)kiosk_active_sessions(gauge)
Alerting¶
- Response time > 15s → warning
- STT confidence < 0.5 → warning
- GPU memory > 90% → critical
- Service restart → warning