Generative Oversampling Methods for Handling Imbalanced Data in Software Fault Prediction

Santosh Singh Rathore,Dixit Kumar Jain,Aakash Gopal Vachhani,Satyendra Singh Chouhan

doi:10.1109/tr.2022.3158949

Abstract

Imbalanced software fault datasets, having fewer faulty modules than the nonfaulty modules, make accurate fault prediction difficult. It is challenging for software practitioners to handle imbalanced fault data during software fault prediction (SFP). Earlier, several researchers have applied oversampling techniques such as synthetic minority oversampling techniques and others for imbalanced learning in SFP. However, most of these techniques resulted in overfitted prediction models. This article presents generative oversampling methods to handle imbalanced data problems in the SFP. Using the generative adversarial network (GAN) based approach, the presented methods generate synthetic samples of the faulty modules to balance the proportion of faulty and nonfaulty modules in the fault datasets. Further, SFP models are built on the processed fault datasets using different machine learning techniques. Experimental validation of the presented oversampling methods is done on 18 fault datasets gathered from PROMISE, JIRA, Eclipse data repositories, and precision, recall, f1-score, and AUC are used as evaluation measures. We extensively compared presented oversampling methods with various state-of-the-art class imbalance techniques and baseline models. The experimental results evidenced that the presented methods improved fault prediction performance and yielded better performance than the state-of-the-art class imbalance techniques.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Generative Oversampling Methods for Handling Imbalanced Data in Software Fault Prediction

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Reliability

Lead the way for us

Journal: IEEE Transactions on Reliability	Publication Date: Jun 1, 2022
Citations: 18

Similar Papers

Is Open-Source Software Valuable for Software Defect Prediction of Proprietary Software and Vice Versa?
Misha Kakkar ... P S Grover
-
Misha Kakkar, et. al.Misha Kakkar ... P S Grover
25 Nov 2017
25 Nov 2017

Tool to handle imbalancing problem in software defect prediction using oversampling methods
Ruchika Malhotra ... Shine Kamal
-
Ruchika Malhotra, et. al.Ruchika Malhotra ... Shine Kamal
01 Sep 2017
01 Sep 2017

Best Suited Machine Learning Techniques for Software Fault Prediction
Devika S ... Lekshmy P L
International Journal of Recent Technology and Engineering (IJRTE) | VOL. 8
Devika S, et. al.Devika S ... Lekshmy P L
30 Mar 2020
International Journal of Recent Technology and Engineering (IJRTE) | VOL. 8

A Deep Introduction to AI Based Software Defect Prediction (SDP) and its Current Challenges
Mahesha Bangalore Ramalinga Pandit ... Nitin Varma
-
Mahesha Bangalore Ramalinga Pandit, et. al.Mahesha Bangalore Ramalinga Pandit ... Nitin Varma
01 Oct 2019
01 Oct 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Generative Oversampling Methods for Handling Imbalanced Data in Software Fault Prediction

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Reliability