Knowledge Base and RAG Architecture¶
Current Stack B RAG¶
User query → nomic-embed-text (Ollama) → Qdrant search → top_k chunks → LLM context
- Vector DB: Qdrant (http://qdrant.cledorze.lan:6333)
- Embedding model: nomic-embed-text (via Ollama)
- Collection: kiosk-catalog
- LLM: qwen2.5:14b (via Ollama)
Virbe RAG (Reference)¶
- Collections → Documents → Auto-chunking → Auto-embedding
- Rich text editor for documents
- "Calculating embeddings" status on save
- Filter by collection in Tool Agent config
- Storage usage tracking
Lessons Learned from Virbe Session¶
-
Monolithic vs Split documents: We tried splitting the E.Leclerc catalog (80 products) into 5 category documents in a new collection. The new collection's embeddings never worked properly. The monolithic document in the original collection worked. Takeaway: embedding quality depends on the collection/index, not just the content.
-
Processing timeout is critical: RAG + LLM = 5-15 seconds. Without feedback, the avatar appears frozen. Must send a "processing" event to the kiosk after N seconds.
-
TTS and RAG content conflict: RAG returns formatted data (prices with €, percentages with %). The LLM must be instructed to rewrite this as natural speech. System instruction is the right place for this.
-
Collection filtering: Virbe's "Filter Knowledge Base" by collection is useful for targeting specific document sets. But cross-collection search has issues. Stack B should implement collection-level filtering in Qdrant (using metadata).
Stack B RAG Design¶
Document Ingestion Pipeline¶
Upload (PDF/TXT/CSV/URL)
│
▼
Chunking (500-1000 tokens per chunk)
│
▼
Embedding (nomic-embed-text)
│
▼
Qdrant upsert (with metadata)
│
▼
Status: indexed ✓
Qdrant Schema¶
{
"collection": "kiosk-knowledge",
"vectors": {
"size": 768,
"distance": "Cosine"
},
"payload_schema": {
"collection_name": "keyword",
"document_name": "keyword",
"document_id": "keyword",
"chunk_index": "integer",
"content_type": "keyword",
"created_at": "datetime"
}
}
Query Flow¶
async def rag_query(user_query: str, collections: list[str] = None, top_k: int = 5):
# 1. Embed query
query_vector = await embed(user_query)
# 2. Search Qdrant (with optional collection filter)
filter = None
if collections:
filter = {"must": [{"key": "collection_name", "match": {"any": collections}}]}
results = await qdrant.search(
collection_name="kiosk-knowledge",
query_vector=query_vector,
query_filter=filter,
limit=top_k
)
# 3. Build context
context = "\n\n".join([r.payload["text"] for r in results])
return context
Admin UI for Knowledge Base¶
Collections:
├── Expertise Vin (5 docs, 12 chunks)
│ ├── Accords Mets-Vins
│ ├── Guide de Dégustation
│ ├── Régions Viticoles
│ ├── Cépages Majeurs
│ └── Appellations et Classifications
│
└── Promotions E.Leclerc (1 doc, 45 chunks)
└── Catalogue Cave de Printemps E.Leclerc
[+ Add Collection] [+ Add Document] [Test RAG Query]
Test RAG Query Panel¶
Input: "Quels champagnes sont en promotion ?" Results: - Chunk 1 (score: 0.89): "Canard-Duchêne AOP Champagne Brut, 75cl — 26,90€..." - Chunk 2 (score: 0.85): "Vranken Demoiselle AOP Champagne Brut — 23,92€..." - Chunk 3 (score: 0.72): "Veuve Ambal Crémant de Bourgogne Brut..."
This test panel is critical for debugging RAG quality without going through the full pipeline.