Skip to content

Knowledge Base and RAG Architecture

Current Stack B RAG

User query → nomic-embed-text (Ollama) → Qdrant search → top_k chunks → LLM context
  • Vector DB: Qdrant (http://qdrant.cledorze.lan:6333)
  • Embedding model: nomic-embed-text (via Ollama)
  • Collection: kiosk-catalog
  • LLM: qwen2.5:14b (via Ollama)

Virbe RAG (Reference)

  • Collections → Documents → Auto-chunking → Auto-embedding
  • Rich text editor for documents
  • "Calculating embeddings" status on save
  • Filter by collection in Tool Agent config
  • Storage usage tracking

Lessons Learned from Virbe Session

  1. Monolithic vs Split documents: We tried splitting the E.Leclerc catalog (80 products) into 5 category documents in a new collection. The new collection's embeddings never worked properly. The monolithic document in the original collection worked. Takeaway: embedding quality depends on the collection/index, not just the content.

  2. Processing timeout is critical: RAG + LLM = 5-15 seconds. Without feedback, the avatar appears frozen. Must send a "processing" event to the kiosk after N seconds.

  3. TTS and RAG content conflict: RAG returns formatted data (prices with €, percentages with %). The LLM must be instructed to rewrite this as natural speech. System instruction is the right place for this.

  4. Collection filtering: Virbe's "Filter Knowledge Base" by collection is useful for targeting specific document sets. But cross-collection search has issues. Stack B should implement collection-level filtering in Qdrant (using metadata).

Stack B RAG Design

Document Ingestion Pipeline

Upload (PDF/TXT/CSV/URL)
        │
        ▼
   Chunking (500-1000 tokens per chunk)
        │
        ▼
   Embedding (nomic-embed-text)
        │
        ▼
   Qdrant upsert (with metadata)
        │
        ▼
   Status: indexed ✓

Qdrant Schema

{
  "collection": "kiosk-knowledge",
  "vectors": {
    "size": 768,
    "distance": "Cosine"
  },
  "payload_schema": {
    "collection_name": "keyword",
    "document_name": "keyword",
    "document_id": "keyword",
    "chunk_index": "integer",
    "content_type": "keyword",
    "created_at": "datetime"
  }
}

Query Flow

async def rag_query(user_query: str, collections: list[str] = None, top_k: int = 5):
    # 1. Embed query
    query_vector = await embed(user_query)

    # 2. Search Qdrant (with optional collection filter)
    filter = None
    if collections:
        filter = {"must": [{"key": "collection_name", "match": {"any": collections}}]}

    results = await qdrant.search(
        collection_name="kiosk-knowledge",
        query_vector=query_vector,
        query_filter=filter,
        limit=top_k
    )

    # 3. Build context
    context = "\n\n".join([r.payload["text"] for r in results])

    return context

Admin UI for Knowledge Base

Collections:
  ├── Expertise Vin (5 docs, 12 chunks)
  │     ├── Accords Mets-Vins
  │     ├── Guide de Dégustation
  │     ├── Régions Viticoles
  │     ├── Cépages Majeurs
  │     └── Appellations et Classifications
  │
  └── Promotions E.Leclerc (1 doc, 45 chunks)
        └── Catalogue Cave de Printemps E.Leclerc

[+ Add Collection]  [+ Add Document]  [Test RAG Query]

Test RAG Query Panel

Input: "Quels champagnes sont en promotion ?" Results: - Chunk 1 (score: 0.89): "Canard-Duchêne AOP Champagne Brut, 75cl — 26,90€..." - Chunk 2 (score: 0.85): "Vranken Demoiselle AOP Champagne Brut — 23,92€..." - Chunk 3 (score: 0.72): "Veuve Ambal Crémant de Bourgogne Brut..."

This test panel is critical for debugging RAG quality without going through the full pipeline.