Defect Prediction Using Akaike and Bayesian Information Criterion

Saleh Albahli,Ghulam Nabi Ahmad Hassan Yar

doi:10.32604/csse.2022.021750

Saleh Albahli, Ghulam Nabi Ahmad Hassan Yar

Open Access

https://doi.org/10.32604/csse.2022.021750

Copy DOI

Journal: Computer Systems Science and Engineering	Publication Date: Jan 1, 2022
Citations: 16	License type: cc-by

Affiliation: Qassim University

Abstract

Data available in software engineering for many applications contains variability and it is not possible to say which variable helps in the process of the prediction. Most of the work present in software defect prediction is focused on the selection of best prediction techniques. For this purpose, deep learning and ensemble models have shown promising results. In contrast, there are very few researches that deals with cleaning the training data and selection of best parameter values from the data. Sometimes data available for training the models have high variability and this variability may cause a decrease in model accuracy. To deal with this problem we used the Akaike information criterion (AIC) and the Bayesian information criterion (BIC) for selection of the best variables to train the model. A simple ANN model with one input, one output and two hidden layers was used for the training instead of a very deep and complex model. AIC and BIC values are calculated and combination for minimum AIC and BIC values to be selected for the best model. At first, variables were narrowed down to a smaller number using correlation values. Then subsets for all the possible variable combinations were formed. In the end, an artificial neural network (ANN) model was trained for each subset and the best model was selected on the basis of the smallest AIC and BIC value. It was found that combination of only two variables’ ns and entropy are best for software defect prediction as it gives minimum AIC and BIC values. While, nm and npt is the worst combination and gives maximum AIC and BIC values.

Highlights

Software defect prediction is a crucial task that has been given a lot of importance recently
An artificial neural network (ANN) model was trained for each subset and the best model was selected on the basis of the smallest Akaike information criterion (AIC) and Bayesian information criterion (BIC) value
This paper proposes the use of model selection using AIC and BIC in both with-in-project and cross-project defect prediction

Summary

Introduction

Software defect prediction is a crucial task that has been given a lot of importance recently. Researchers focus on defect prediction in order to improve the user experience and overall quality of software. When a product is introduced into the market it takes a lot of time to reach its maturity level. It passes through changes; some changes improve the product and some of them conflict with the product and cause a defect. These changes are called “defect-inducing-change” and needed to be found and removed [1].

Methods

Results

Conclusion