A Hybrid GAN-Based Approach to Solve Imbalanced Data Problem in Recommendation Systems

Wafa Shafqat,Yung-Cheol Byun

doi:10.1109/access.2022.3141776

Wafa Shafqat, Yung-Cheol Byun

Open Access

https://doi.org/10.1109/access.2022.3141776

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2022
Citations: 15	License type: CC BY 4.0

Affiliation: Jeju National University

Abstract

With the advent of information technology, the amount of online data generation has been massive. Recommendation systems have become an effective tool in filtering information and solving the problem of information overload. Machine learning algorithms to build these recommendation systems require well-balanced data in terms of class distribution, but real-world datasets are mostly imbalanced in nature. Imbalanced data imposes a classifier to focus more on the majority class, neglecting other classes of interests and thus hindering the predictive performance of any classification model. There exist many traditional techniques for oversampling minority classes. Still, generative adversarial networks (GAN) have been showing excellent results in generating realistic synthetic tabular data that keeps the probability distribution of the original data intact. In this paper, we propose a hybrid GAN approach to solve the data imbalance problem to enhance recommendation systems’ performance. We implemented conditional Wasserstein GAN with gradient penalty to generate tabular data containing both numerical and categorical values. We also augmented auxiliary classifier loss to enforce the model to explicitly generate data belonging to the minority class. We designed the discriminator architecture with the concept of PacGAN to receive m-packed samples as input instead of a single input. This inclusion of the PacGAN architecture eliminated the mode collapse problem in our proposed model. We did a two-fold evaluation of our model. Firstly based on the quality of the generated data and secondly on how different recommendation models perform using the generated data compared to original data.

Highlights

An enormous amount of data is daily generated at a large scale in every domain
Since we are working on enhancing the performance of product recommendation systems, we considered these crucial data while labeling minority and majority classes for our generative adversarial networks (GAN) models for generating synthetic data
In this work, we propose a novel hybrid GAN model that combines the advantages of PacGAN into the architecture of conditional Wasserstein GAN to oversample the minority classes in online shopping tabular data to enhance the accuracy and reduce the error rate of recommendation systems

Summary

Introduction

Research says [1] digital data has grown nine times in volume in the past five years. Researchers are constantly trying to enhance the user experience and develop a market that is beneficial for both the customer and the company. Systems have been a leading-edge technology in extracting meaningful information from the widely collected unstructured data in every field. Recommendation systems organize this abundance of data and help in customers’ decisionmaking by narrowing down the options and analyzing every customer’s personal preferences; that saves a lot of time and has the power to promote unexplored products or places that could benefit the business

Objectives

Methods

Findings

Conclusion