Wasserstein GAN-based architecture to generate collaborative filtering synthetic datasets

Jesús Bobadilla,Abraham Gutiérrez

doi:10.1007/s10489-024-05313-4

Abstract

Currently, generative applications are reshaping different fields, such as art, computer vision, speech processing, and natural language. The computer science personalization area is increasingly relevant since large companies such as Spotify, Netflix, TripAdvisor, Amazon, and Google use recommender systems. Then, it is rational to expect that generative learning will increasingly be used to improve current recommender systems. In this paper, a method is proposed to generate synthetic recommender system datasets that can be used to test the recommendation performance and accuracy of a company on different simulated scenarios, such as large increases in their dataset sizes, number of users, or number of items. Specifically, an improvement in the state-of-the-art method is proposed by applying the Wasserstein concept to the generative adversarial network for recommender systems (GANRS) seminal method to generate synthetic datasets. The results show that our proposed method reduces the mode collapse, increases the sizes of the synthetic datasets, improves their ratings distributions, and maintains the potential to choose the desired number of users, number of items, and starting size of the dataset. Both the baseline GANRS and the proposed Wasserstein-based WGANRS deep learning architectures generate fake profiles from dense, short, and continuous embeddings in the latent space instead of the sparse, large, and discrete raw samples that previous GAN models used as a source. To enable reproducibility, the Python and Keras codes are provided in open repositories along with the synthetic datasets generated to test the proposed architecture (https://github.com/jesusbobadilla/ganrs.git).Graphical abstract

Full Text