Software defects prediction by metaheuristics tuned extreme gradient boosting and analysis based on Shapley Additive Explanations

Nebojsa Bacanin,Bosko Nikolic,Dragan Pamucar,Vladimir Simic,Tamara Zivkovic

doi:10.1016/j.asoc.2023.110659

Abstract

Software testing represents a crucial component of software development, and it is usually making the difference between successful and failed projects. Although it is extremely important, due to the fast pace and short deadlines of contemporary projects it is often neglected or not detailed enough due to the lack of available time, leading to the potential loss of reputation, private users’ data, money, and even lives in some circumstances. In such situations, it would be vital to have the option to predict what modules are error-prone according to the collection of software metrics, and to focus testing on them, and that task is a typical classification task. Machine learning models have been frequently employed within a wide range of classification problems with significant success, and this paper proposes eXtreme gradient boosting (XGBoost) model to execute the defect prediction task. A modified variant of the well-known reptile search optimization algorithm has been suggested to carry out the calibrating of the XGBoost hyperparameters. The enhanced algorithm was named HARSA and evaluated on the collection of challenging CEC2019 benchmark functions, where it exhibited excellent performance. Later, the introduced XGBoost model that uses the proposed algorithm has been evaluated on two benchmark software testing datasets, and the simulation outcomes have been compared to other powerful swarm intelligence metaheuristics that were used in the identical experimental environment, where the proposed approach attained superior classification accuracy on both datasets. Finally, Shapley Additive Explanations analysis was conducted to discover the impact of various software metrics on the classification results.

Full Text