RAG Assistant for a Shopping Mall: Fully On-Premises
I recently explored the concept of an on-premises information and reference system for a shopping mall. The goal sounds simple: a visitor asks, “Where can I buy running shoes at a discount?”—and receives a coherent answer in natural language, rather than just a list of floor numbers.
What kind of system is this?
The approach is RAG (Retrieval-Augmented Generation). Roughly speaking: the system searches for relevant data in the mall’s database, passes it to a language model, and the model generates a ready-made response. No cloud—everything runs on a local server.
Three languages: Russian, English, Chinese. A single index—the user types in any language and gets a response in the same one.
Access points: website, mobile app, kiosks inside the shopping center (they only need a browser; all processing is handled by the central server).
Where the data comes from
The source is an existing MySQL database. Text cards are generated from it:
Store: SportLand
Floor: 2
Category: sporting goods
Promotion: 20% off
Card → embedding → vector database → linked to the original MySQL record.
Stack
- Vector DB: Qdrant (or PostgreSQL + pgvector)
- LLM: Qwen 14B / 32B Instruct
- Embeddings: bge-m3 or Qwen Embedding
- Backend: Python FastAPI
- Frontend: Vue/Nuxt or React/Next.js
Why Qwen, Not DeepSeek
We compared the two options. For RAG tasks, Qwen comes out on top: it maintains context better, has fewer hallucinations, and offers more stable multilingual support, especially for Chinese. DeepSeek is stronger in reasoning and analytics—but that’s not our use case.
Hardware
Translated with DeepL.com (free version)