An empirical study to investigate oversampling methods for improving software defect prediction using imbalanced data

Ruchika Malhotra,Shine Kamal

doi:10.1016/j.neucom.2018.04.090

Abstract

Software defect prediction is important to identify defects in the early phases of software development life cycle. This early identification and thereby removal of software defects is crucial to yield a cost-effective and good quality software product. Though, previous studies have successfully used machine learning techniques for software defect prediction, these techniques yield biased results when applied on imbalanced data sets. An imbalanced data set has non-uniform class distribution with very few instances of a specific class as compared to that of the other class. Use of imbalanced datasets leads to off-target predictions of the minority class, which is generally considered to be more important than the majority class. Thus, handling imbalanced data effectively is crucial for successful development of a competent defect prediction model. This study evaluates the effectiveness of machine learning classifiers for software defect prediction on twelve imbalanced NASA datasets by application of sampling methods and cost sensitive classifiers. We investigate five existing oversampling methods, which replicate the instances of minority class and also propose a new method SPIDER3 by suggesting modifications in SPIDER2 oversampling method. Furthermore, the work evaluates the performance of MetaCost learners for cost sensitive learning on imbalanced datasets. The results show improvement in the prediction capability of machine learning classifiers with the use of oversampling methods. Furthermore, the proposed SPIDER3 method shows promising results.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An empirical study to investigate oversampling methods for improving software defect prediction using imbalanced data

Abstract

Talk to us

Similar Papers

More From: Neurocomputing

Lead the way for us

Journal: Neurocomputing	Publication Date: Feb 4, 2019
Citations: 101

Similar Papers

Software Defect Prediction Through a Hybrid Approach Comprising of a Statistical Tool and a Machine Learning Model
Ashis Kumar Chakraborty ... Barin Karmakar
-
Ashis Kumar Chakraborty, et. al.Ashis Kumar Chakraborty ... Barin Karmakar
01 Jan 2023
01 Jan 2023

WR-ELM: Weighted Regularization Extreme Learning Machine for Imbalance Learning in Software Fault Prediction
Pravas Ranjan Bal ... Sandeep Kumar
IEEE Transactions on Reliability | VOL. 69
Pravas Ranjan Bal, et. al.Pravas Ranjan Bal ... Sandeep Kumar
15 Jun 2020
IEEE Transactions on Reliability | VOL. 69

Multiple kernel ensemble learning for software defect prediction
Tiejian Wang ... Zhiwu Zhang
Automated Software Engineering | VOL. 23
Tiejian Wang, et. al.Tiejian Wang ... Zhiwu Zhang
07 Apr 2015
Automated Software Engineering | VOL. 23

Generative Oversampling Methods for Handling Imbalanced Data in Software Fault Prediction
Santosh Singh Rathore ... Satyendra Singh Chouhan
IEEE Transactions on Reliability | VOL. 71
Santosh Singh Rathore, et. al.Santosh Singh Rathore ... Satyendra Singh Chouhan
01 Jun 2022
IEEE Transactions on Reliability | VOL. 71

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An empirical study to investigate oversampling methods for improving software defect prediction using imbalanced data

Abstract

Talk to us

Similar Papers

More From: Neurocomputing