Genetic algorithm-based oversampling approach to prune the class imbalance issue in software defect prediction

C Arun,C Lakshmi

doi:10.1007/s00500-021-06112-6

Abstract

Class imbalance is the potential problem that has been existent in machine learning, which hinders the performance of the classification algorithm when applied in real-world applications such as electricity pilferage, fraudulent transactions, anomaly detection, and prediction of rare diseases. Class imbalance refers to the problem where the distribution of the sample is skewed or biased toward one particular class. Due to its intrinsic nature the software fault prediction dataset falls into the same category where the software modules contain fewer defective modules compared to the non-defective modules. The majority of the oversampling techniques that has been proposed is to address the issue by generating synthetic samples of minority class to balance the dataset. But the synthetic samples generated are near duplicates that also results in over-generalization issue. We thus propose a novel oversampling approach to introduce synthetic samples using genetic algorithm (GA). GA is a form of evolutionary algorithm that employs biologically inspired techniques such as inheritance, mutation, selection, and crossover. The proposed algorithm generates synthetic sample of minority class based on the distribution measure and ensures that the samples are diverse within the class and are efficient. The proposed oversampling algorithm has been compared with SMOTE, BSMOTE, ADASYN, random oversampling, MAHAKIL, and no sampling approach with 20 defect prediction datasets from the promise repository and five prediction models. The results indicate that the genetic algorithm oversampling approach improves the fault prediction performance and reduced false alarm rate.

Highlights

Class Imbalance is the potential problem that has been existent in machine learning, which hinders the performance of the classification algorithm when applied in real world applications such as electricity pilferage, fraudulent transactions, anomaly detection, prediction of rare diseases, etc
Due to its intrinsic nature the software fault prediction dataset falls into the same category where the software modules contain fewer defective modules compared to the non-defective modules
Majority of the over sampling techniques that has been proposed is to address the issue by generating synthetic samples of minority class to balance the dataset

Summary

Introduction

Class Imbalance is the potential problem that has been existent in machine learning, which hinders the performance of the classification algorithm when applied in real world applications such as electricity pilferage, fraudulent transactions, anomaly detection, prediction of rare diseases, etc. ) SRM Institute of Science and Technology Lakshmi C SRM Institute of Science and Technology

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Soft Computing	Publication Date: Aug 29, 2021
Citations: 13	License type: cc-by

R Discovery Prime

R Discovery Prime

Genetic algorithm-based oversampling approach to prune the class imbalance issue in software defect prediction

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Soft Computing

Lead the way for us

Similar Papers

An empirical study on the effectiveness of data resampling approaches for cross‐project software defect prediction
Kwabena Ebo Bennin ... Amjed Tahir
IET Software | VOL. 16
Kwabena Ebo Bennin, et. al.Kwabena Ebo Bennin ... Amjed Tahir
28 Nov 2021
IET Software | VOL. 16

Software defect prediction via transfer learning based neural network
Qimeng Cao ... Qing Sun
-
Qimeng Cao, et. al.Qimeng Cao ... Qing Sun
01 Oct 2015
01 Oct 2015

Software defect prediction using a bidirectional LSTM network combined with oversampling techniques
Nasraldeen Alnor Adam Khleel ... Károly Nehéz
Cluster Computing | VOL. 27
Nasraldeen Alnor Adam Khleel, et. al.Nasraldeen Alnor Adam Khleel ... Károly Nehéz
28 Oct 2023
Cluster Computing | VOL. 27

Tackling Class Imbalance Problem in Software Defect Prediction Through Cluster-Based Over-Sampling With Filtering
Lina Gong ... Shujuan Jiang
IEEE Access | VOL. 7
Lina Gong, et. al.Lina Gong ... Shujuan Jiang
01 Jan 2019
IEEE Access | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Genetic algorithm-based oversampling approach to prune the class imbalance issue in software defect prediction

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Soft Computing