A Cost-Sensitive Approach to Enhance the use of ML Classifiers in Software Testing Efforts

Alexandre Moreira Nascimento,Lucio Flavio Vismari,Jorge Rady De Almeira Junior,Joao Batista Camargo Junior,Paulo Sergio Cugnasca

doi:10.1109/icmla.2019.00292

Alexandre Moreira Nascimento, Lucio Flavio Vismari + Show 3 more

https://doi.org/10.1109/icmla.2019.00292

Copy DOI

Abstract

The use of Machine Learning (ML) classifiers to predict defective software modules are useful to help on planning software testing activities. Most of those studies use the accuracy as the main metric to evaluate the quality of the ML classifier. However, when unbalanced datasets are used to train and test the classifier, the ML model becomes biased. Biased ML models hide their real accuracy. In this context, this study proposes an approach to enhance the use of ML classifiers for predicting defective software modules even with unbalanced datasets. The results indicate: (1) a significant reduction on the number of false negatives; (2) a considerable gain on the efficacy of the software testing; (3) an increase of the number of modules correctly indicated as defective; however, there were also (4) an increase of the scope of the test suggested by the model; (5) a reduction of the software testing efficiency; (6) an increase of the number of the false positives; and (7) reduction of the overall accuracy. Therefore, the proposed approach imposes a trade-off to be considered when planning the software testing activities. Finally, this study also proposes an approach to help managers to deal with those trade-offs considering the resource constraints.

Full Text