An Empirical Study on Model-Agnostic Techniques for Source Code-Based Defect Prediction

Yi Zhu,Yuxiang Gao,Qiao Yu

doi:10.1142/s0218194023500572

Abstract

Interpretation is important for adopting software defect prediction in practice. Model-agnostic techniques such as Local Interpretable Model-agnostic Explanation (LIME) can help practitioners understand the factors which contribute to the prediction. They are effective and useful for models constructed on tabular data with traditional features. However, when they are applied on source code-based models, they cannot differentiate the contribution of code tokens in different locations for deep learning-based models with Bag-of-Word features. Besides, only using limited features as explanation may result in information loss about actual riskiness. Such limitations may lead to inaccurate explanation for source code-based models, and make model-agnostic techniques not useful and helpful as expected. Thus, we apply a perturbation-based approach Randomized Input Sampling Explanation (RISE) for source code-based defect prediction. Besides, to fill the gap that there lacks a systematical evaluation on model-agnostic techniques on source code-based defect models, we also conduct an extensive case study on the model-agnostic techniques on both token frequency-based and deep learning-based models. We find that (1) model-agnostic techniques are effective to identify the most important code tokens for an individual prediction and predict defective lines based on the importance scores, (2) using limited features (code tokens) for explanation may result in information loss about actual riskiness, and (3) RISE is more effective than others as it can generate more accurate explanation, achieve better cost-effectiveness for line-level prediction, and result in less information loss about actual riskiness. Based on such findings, we suggest that model-agnostic techniques can be a supplement to file-level source code-based defect models, while such explanations should be used with caution as actual risky tokens may be ignored. Also, compared with LIME, we would recommend RISE for a more effective explanation.

Full Text