RAG Assistant for a Shopping Mall: Fully On-Premises

I recently explored the concept of an on-premises information and reference system for a shopping mall. The goal sounds simple: a visitor asks, “Where can I buy running shoes at a discount?”—and receives a coherent answer in natural language, rather than just a list of floor numbers.

What kind of system is this?

The approach is RAG (Retrieval-Augmented Generation). Roughly speaking: the system searches for relevant data in the mall’s database, passes it to a language model, and the model generates a ready-made response. No cloud—everything runs on a local server.

Three languages: Russian, English, Chinese. A single index—the user types in any language and gets a response in the same one.

Access points: website, mobile app, kiosks inside the shopping center (they only need a browser; all processing is handled by the central server).

Where the data comes from

The source is an existing MySQL database. Text cards are generated from it:

Store: SportLand
Floor: 2
Category: sporting goods
Promotion: 20% off

Card → embedding → vector database → linked to the original MySQL record.

Stack

Vector DB: Qdrant (or PostgreSQL + pgvector)
LLM: Qwen 14B / 32B Instruct
Embeddings: bge-m3 or Qwen Embedding
Backend: Python FastAPI
Frontend: Vue/Nuxt or React/Next.js

Why Qwen, Not DeepSeek

We compared the two options. For RAG tasks, Qwen comes out on top: it maintains context better, has fewer hallucinations, and offers more stable multilingual support, especially for Chinese. DeepSeek is stronger in reasoning and analytics—but that’s not our use case.

Hardware