Oversampling based on generative adversarial networks to overcome imbalance data in predicting fraud insurance claim

Ranu A Nugraha,Agus Subekti,Hilman F Pardede

doi:10.48129/kjs.splml.19119

Abstract

Fraud on health insurance impacts cost overruns and a quality decline in health services in the long term. The use of machine learning to detect fraud on health insurance is increasingly popular. However, one challenge in predicting health insurance fraud is the data imbalance. The data imbalance can cause a bias towards the majority class in many machine learning methods. Oversampling is a solution for data imbalance by augmenting new data based on the existing minority class data. Recently, there has been growing interest in employing deep learning for data augmentation. One of them is using Generative Adversarial Networks (GAN). This paper proposes using GAN as an oversampling method to generate additional data for minority classes. Since data for detecting health insurance fraud are tabular, we adopt Conditional Tabular GAN (CTGAN) architecture where the generator is conditioned to adjust the tabular data input and receive additional information to produce samples according to the specified class conditions. The new balanced data are used to train 17 classification algorithms. Our experiments showed that the proposed method performs better than other oversampling methods on several evaluation metrics, i.e., accuracy, precision score, F1-score, and ROC.

Full Text