I built my first RAG in 2019. The task was to create a product that worked like this: send a photo of clothing, the application suggests 10 most similar products and links to various stores.

I needed to gather a database of stores, their products, and photos, then transform those images into vectors and embed them in multidimensional space.

System Architecture

Data Ingestion (Facade & Strategy Pattern)

The project initially assumed using Kafka as a queue system, but with good design patterns — such as facade and strategy — I managed to unify the worker classes:

  • Worker querying stores for updated products
  • Worker processing images into vectors
  • Worker updating multidimensional space

The application handled multiple data providers. Each store had its own strategy for mapping XML fields to a standardized product model.

Image Vectorization Pipeline

The key element of the RAG system was visual embedding. The application fetched images, scaled them to 224x224px (RGB) — the standard input size for computer vision models — then extracted datasets for computing similarity vectors (KNN).

Indexing and Database

The heart of the system was a PostgreSQL database managed by SQLAlchemy. Products were indexed not only by ID but also in relation to categories and gender, enabling hybrid search capabilities.

Key Features

  • Scalability: Thanks to multithreading, image fetching and data updates happen in parallel across multiple stores
  • Unified Data Model: Regardless of source, each product lands in the database with the same structure
  • Automation: Scripts monitoring changes and automating daily knowledge base updates
  • AI-Ready: Database structure designed for integration with Keras or TensorFlow

Greatest Success

Successfully deploying the product to production. All the pieces — from data ingestion, through image processing, to vector search — worked seamlessly together.