Statement-Level Software Defect Prediction Based on Improved R-Transformer

Yulei Zhu,Yufeng Zhang,Zhenbang Chen

doi:10.1142/s0218126623501839

Abstract

Engineers use software defect prediction (SDP) to locate vulnerable areas of software. Recently, statement-level SDP has attracted the attention of researchers due to its ability to localize faulty code areas. This paper proposes DP-Tramo, a new model dedicated to improving the state-of-the-art statement-level SDP. We use Clang to extract abstract syntax trees from source code and extract 32 statement-level metrics as static features for each sentence. Then we feed static features and token sequences as inputs to our improved R-Transformer to learn the syntactic and semantic features of the code. Furthermore, we use label smoothing and weighted loss to improve the performance of DP-Tramo. To evaluate DP-Tramo, we perform a 10-fold cross-validation on 119,989 C/C++ programs selected from Code4Bench. Experimental results show that DP-Tramo can classify the dataset with an average performance of 0.949, 0.602, 0.734 and 0.737 regarding the recall, precision, accuracy and F1-measure, respectively. DP-Tramo outperforms the baseline method on F1-measure by 1.2% while maintaining a high recall rate.

Full Text