Software Defect Prediction Using Heterogeneous Ensemble Classification Based on Segmented Patterns

Hamad Alsawalqah,Neveen Hijazi,Yazan Alshamaileh,Ibrahim Aljarah,Ahmed Al Radaideh,Hossam Faris,Mohammed Eshtay

doi:10.3390/app10051745

Hamad Alsawalqah, Neveen Hijazi + Show 5 more

Open Access

https://doi.org/10.3390/app10051745

Copy DOI

Abstract

Software defect prediction is a promising approach aiming to improve software quality and testing efficiency by providing timely identification of defect-prone software modules before the actual testing process begins. These prediction results help software developers to effectively allocate their limited resources to the modules that are more prone to defects. In this paper, a hybrid heterogeneous ensemble approach is proposed for the purpose of software defect prediction. Heterogeneous ensembles consist of set of classifiers of different learning base methods in which each of them has its own strengths and weaknesses. The main idea of the proposed approach is to develop expert and robust heterogeneous classification models. Two versions of the proposed approach are developed and experimented. The first is based on simple classifiers, and the second is based on ensemble ones. For evaluation, 21 publicly available benchmark datasets are selected to conduct the experiments and benchmark the proposed approach. The evaluation results show the superiority of the ensemble version over other well-regarded basic and ensemble classifiers.

Highlights

Individuals and society increasingly rely on advanced software systems
The first is based on simple classifiers (i.e., k-Nearest Neighbour (k-NN), NB, and Decision Tree (DT)), and the second is based on ensemble ones (i.e., Bagging, Adaptive Boosting (AdaBoost), Random Forest (RF), and XGBoost (XGB))
The proposed approach is experimented based on utilizing powerful ensemble classifiers (Bagging, AdaBoost, RF, and XGB)

Summary

Introduction

Because software is intertwined with all aspects of our lives, it is essential to produce reliable and trustworthy systems economically and quickly. This is increasingly being challenged by the rapid growth in size and complexity of today’s software. Defective software modules increase the development and maintenance costs and cause customer dissatisfaction [3,4]. Effective defect prediction could help test managers locate bugs and facilitate the allocation of limited SQA resources optimally and economically; it has become an extremely important research topic [7,8,9,10,11,12]. A prediction model is used to predict the defective software modules in one of the three categories: binary class classification of defects [13,14,15,16], number of defects/defect density prediction [17,18,19,20], and severity of defect prediction [21,22]

Objectives

Results

Conclusion