Evaluating the Impact of Data Transformation Techniques on the Performance and Interpretability of Software Defect Prediction Models

Yu Zhao,Yuxiang Gao,Lina Gong,Qiao Yu,Zhiqiu Huang,Yi Zhu

doi:10.1049/2023/6293074

Abstract

The performance of software defect prediction (SDP) models determines the priority of test resource allocation. Researchers also use interpretability techniques to gain empirical knowledge about software quality from SDP models. However, SDP methods designed in the past research rarely consider the impact of data transformation methods, simple but commonly used preprocessing techniques, on the performance and interpretability of SDP models. Therefore, in this paper, we investigate the impact of three data transformation methods (Log, Minmax, and Z-score) on the performance and interpretability of SDP models. Through empirical research on (i) six classification techniques (random forest, decision tree, logistic regression, Naive Bayes, K-nearest neighbors, and multilayer perceptron), (ii) six performance evaluation indicators (Accuracy, Precision, Recall, F1, MCC, and AUC), (iii) two interpretable methods (permutation and SHAP), (iv) two feature importance measures (Top-k feature rank overlap and difference), and (v) three datasets (Promise, Relink, and AEEEM), our results show that the data transformation methods can significantly improve the performance of the SDP models and greatly affect the variation of the most important features. Specifically, the impact of data transformation methods on the performance and interpretability of SDP models depends on the classification techniques and evaluation indicators. We observe that log transformation improves NB model performance by 7%–61% on the other five indicators with a 5% drop in Precision. Minmax and Z-score transformation improves NB model performance by 2%–9% across all indicators. However, all three transformation methods lead to substantial changes in the Top-5 important feature ranks, with differences exceeding 2 in 40%–80% of cases (detailed results available in the main content). Based on our findings, we recommend that (1) considering the impact of data transformation methods on model performance and interpretability when designing SDP approaches as transformations can improve model accuracy, and potentially obscure important features, which lead to challenges in interpretation, (2) conducting comparative experiments with and without the transformations to validate the effectiveness of proposed methods which are designed to improve the prediction performance, and (3) tracking changes in the most important features before and after applying data transformation methods to ensure precise and traceable interpretability conclusions to gain insights. Our study reminds researchers and practitioners of the need for comprehensive considerations even when using other similar simple data processing methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IET Software	Publication Date: Nov 14, 2023
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Evaluating the Impact of Data Transformation Techniques on the Performance and Interpretability of Software Defect Prediction Models

Abstract

Talk to us

Similar Papers

More From: IET Software

Lead the way for us

Similar Papers

Supp1-3131950.pdf
Gopi Krishnan Rajbahadur
-
Gopi Krishnan RajbahadurGopi Krishnan Rajbahadur
02 Dec 2021
Supp1-3131950.pdf
Gopi Krishnan Rajbahadur

Revisiting the Impact of Dependency Network Metrics on Software Defect Prediction
Lina Gong ... Ahmed E Hassan
IEEE Transactions on Software Engineering | VOL. -
Lina Gong, et. al.Lina Gong ... Ahmed E Hassan
01 Jan 2020
IEEE Transactions on Software Engineering | VOL. -

Is Open-Source Software Valuable for Software Defect Prediction of Proprietary Software and Vice Versa?
Misha Kakkar ... P S Grover
-
Misha Kakkar, et. al.Misha Kakkar ... P S Grover
25 Nov 2017
25 Nov 2017

A Comprehensive Investigation of the Impact of Class Overlap on Software Defect Prediction
Lina Gong ... Zhiqiu Huang
IEEE Transactions on Software Engineering | VOL. 49
Lina Gong, et. al.Lina Gong ... Zhiqiu Huang
01 Apr 2023
IEEE Transactions on Software Engineering | VOL. 49

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Evaluating the Impact of Data Transformation Techniques on the Performance and Interpretability of Software Defect Prediction Models

Abstract

Talk to us

Similar Papers

More From: IET Software