Credibility Based Imbalance Boosting Method for Software Defect Proneness Prediction

Haonan Tong,Shihai Wang,Guangling Li

doi:10.3390/app10228059

Abstract

Imbalanced data are a major factor for degrading the performance of software defect models. Software defect dataset is imbalanced in nature, i.e., the number of non-defect-prone modules is far more than that of defect-prone ones, which results in the bias of classifiers on the majority class samples. In this paper, we propose a novel credibility-based imbalance boosting (CIB) method in order to address the class-imbalance problem in software defect proneness prediction. The method measures the credibility of synthetic samples based on their distribution by introducing a credit factor to every synthetic sample, and proposes a weight updating scheme to make the base classifiers focus on synthetic samples with high credibility and real samples. Experiments are performed on 11 NASA datasets and nine PROMISE datasets by comparing CIB with MAHAKIL, AdaC2, AdaBoost, SMOTE, RUS, No sampling method in terms of four performance measures, i.e., area under the curve (AUC), F1, AGF, and Matthews correlation coefficient (MCC). Wilcoxon sign-ranked test and Cliff’s δ are separately used to perform statistical test and calculate effect size. The experimental results show that CIB is a more promising alternative for addressing the class-imbalance problem in software defect-prone prediction as compared with previous methods.

Highlights

Software defect prediction has been an important research topic in the field of software engineering for more than three decades [1]
Our credibility-based imbalance boosting (CIB) significantly outperforms the baselines with at least medium effect size in terms of F1, Matthews correlation coefficient (MCC), and area under the curve (AUC)
The class-imbalance data is a major factor to lower the performance of software defect prediction models [31,32]

Summary

Introduction

Software defect prediction has been an important research topic in the field of software engineering for more than three decades [1]. Software defect prediction models help to reasonably allocate limited test resources and improve test efficiency by identifying the defective modules before software testing, which has drawn increasing attention of both the academic and industrial communities [2,3,4,5,6,7,8]. Software defect prediction can be regarded as a binary classification problem, where the software modules are classified as defect-prone or non-defect-prone. By mining the historical defect dataset with the statistical or machine learning techniques, software defect proneness prediction models are built to establish the relationship between software metrics (the independent variables) and defect proneness of software modules (such as method, class, and file) and are used to predict the labels (defect-prone or non-defect-prone) of new software modules.

Objectives

Methods

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: Nov 13, 2020
Citations: 11	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Credibility Based Imbalance Boosting Method for Software Defect Proneness Prediction

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Empirical Analysis of Data Sampling-Based Ensemble Methods in Software Defect Prediction
Abdullateef O Balogun ... Babajide J Odejide
-
Abdullateef O Balogun, et. al.Abdullateef O Balogun ... Babajide J Odejide
01 Jan 2021
01 Jan 2021

An Empirical Study on Data Sampling Methods in Addressing Class Imbalance Problem in Software Defect Prediction
Babajide J Odejide ... Shakirat A Salihu
-
Babajide J Odejide, et. al.Babajide J Odejide ... Shakirat A Salihu
01 Jan 2021
01 Jan 2021

A new sampling approach for classification of imbalanced data sets with high density
Jia Pengfei ... Zhang Chunkai
-
Jia Pengfei, et. al. Jia Pengfei ... Zhang Chunkai
01 Jan 2014
01 Jan 2014

Oversampling boosting for classification of imbalanced software defect data
Guangling Li ... Shihai Wang
-
Guangling Li, et. al.Guangling Li ... Shihai Wang
01 Jul 2016
01 Jul 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Credibility Based Imbalance Boosting Method for Software Defect Proneness Prediction

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences