LIMCR: Less-Informative Majorities Cleaning Rule Based on Naïve Bayes for Imbalance Learning in Software Defect Prediction

Yumei Wu,Jingxiu Yao,Shuo Chang,Bin Liu

doi:10.3390/app10238324

Abstract

Software defect prediction (SDP) is an effective technique to lower software module testing costs. However, the imbalanced distribution almost exists in all SDP datasets and restricts the accuracy of defect prediction. In order to balance the data distribution reasonably, we propose a novel resampling method LIMCR on the basis of Naïve Bayes to optimize and improve the SDP performance. The main idea of LIMCR is to remove less-informative majorities for rebalancing the data distribution after evaluating the degree of being informative for every sample from the majority class. We employ 29 SDP datasets from the PROMISE and NASA dataset and divide them into two parts, the small sample size (the amount of data is smaller than 1100) and the large sample size (larger than 1100). Then we conduct experiments by comparing the matching of classifiers and imbalance learning methods on small datasets and large datasets, respectively. The results show the effectiveness of LIMCR, and LIMCR+GNB performs better than other methods on small datasets while not brilliant on large datasets.

Highlights

Software defect prediction (SDP) is an effective technique to lower software module testing costs.It can efficiently identify defect-prone software modules by learning information from defect datasets of the previous release
We present a novel resampling method LIMCR based on Naïve Bayes to solve the class imbalance problem in SDP datasets
We notice that the average balancedscore and G-mean of LIMCR are 0.701 and 0.69 which perfoms better than other baseline imbalance learning methods

Summary

Introduction

Software defect prediction (SDP) is an effective technique to lower software module testing costs.It can efficiently identify defect-prone software modules by learning information from defect datasets of the previous release. Most prediction algorithms assume that the number of samples in any class are balanced This contradiction makes the prediction algorithms trained in imbalanced software defect datasets are generally biased towards the samples in non-defect-prone classes and ignore the samples in defect-prone classes, i.e., many defect-prone samples might be classified into non-defect-prone class based on prediction algorithms trained by imbalanced datasets. This problem widely occurs in SDP and it has proved that reducing the influence of the imbalance problem can improve prediction performance efficiently

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: Nov 24, 2020
Citations: 5	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

LIMCR: Less-Informative Majorities Cleaning Rule Based on Naïve Bayes for Imbalance Learning in Software Defect Prediction

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

A Comprehensive Investigation of the Role of Imbalanced Learning for Software Defect Prediction
Qinbao Song ... Yuchen Guo
IEEE Transactions on Software Engineering | VOL. 45
Qinbao Song, et. al.Qinbao Song ... Yuchen Guo
01 Dec 2019
IEEE Transactions on Software Engineering | VOL. 45

Tackling class overlap and imbalance problems in software defect prediction
Lin Chen ... Yuanyan Tang
Software Quality Journal | VOL. 26
Lin Chen, et. al.Lin Chen ... Yuanyan Tang
25 Sep 2016
Software Quality Journal | VOL. 26

A Comprehensive Investigation of the Impact of Class Overlap on Software Defect Prediction
Lina Gong ... Zhiqiu Huang
IEEE Transactions on Software Engineering | VOL. 49
Lina Gong, et. al.Lina Gong ... Zhiqiu Huang
01 Apr 2023
IEEE Transactions on Software Engineering | VOL. 49

Implementation of Elliptic Curve Digital Signature Algorithms
Temitope O.S Olorunfemi ... B.K Alese
Journal of Software Engineering | VOL. 1
Temitope O.S Olorunfemi, et. al.Temitope O.S Olorunfemi ... B.K Alese
15 Dec 2006
Journal of Software Engineering | VOL. 1

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

LIMCR: Less-Informative Majorities Cleaning Rule Based on Naïve Bayes for Imbalance Learning in Software Defect Prediction

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences