Ensemble Machine Learning Paradigms in Software Defect Prediction

Tarunim Sharma,Aman Jatain,Shalini Bhaskar,Kavita Pabreja

doi:10.1016/j.procs.2023.01.002

Abstract

Predicting faults in software aims to detect defects before the testing phase, allowing for better resource allocation and high-quality software development, which is a requisite for any organization. Machine learning techniques aid in the resolution of such issues and a variety of predictive models are being developed to categorize the software into defective modules and the one which is non-defective ones. Though applying these advanced machine learning techniques results in better utilization of time and other resources, there is still poor prediction as reported in many studies. This is because of several challenges that block defective software data, including redundancy, correlation, feature irrelevance, missing samples, and an imbalanced distribution between the faulty and non-faulty classes. Ensemble Machine learning has been adopted by practitioners and researchers globally to deal with such problems, and it is proven to demonstrate some improvement in defect prediction performance. In this review paper, all ensemble-based machine learning techniques developed for software defect prediction from 2018 to 2021 have been critically analyzed. The nucleus of this paper is to get a deep insight into why the various hybrid models still suffer from poor performance on the available datasets. A detailed review with a focus on multiple perspectives viz. faulty and non-defective datasets, performance evaluation criteria, and machine learning techniques have revealed certain gaps that can be addressed by developing more robust hyperparameter optimization algorithms, feature engineering, developing stacking and averaging models.

Full Text