A Cross-Project Defect Prediction Model Based on Deep Learning With Self-Attention

Wanzhi Wen,Xinxin Gao,Meng Yu,Chenqiang Shen,Chuyue Wang,Suchuan Zhang,Ruinian Zhang

doi:10.1109/access.2022.3214536

Abstract

Cross-project defect prediction technique is a hot topic in the field of software defect research because of the huge difference in data distribution between source project and target project. The previous defect prediction technique used manually defined parameters to extract the features of the project, which were used by classifiers to construct a defect prediction model. However, traditional features lack sufficient semantic information that exists between source codes, resulting in poor performance of the prediction models. So, driven by the idea that sufficient semantic information helps to construct more accurate prediction models, we propose a cross-project defect prediction framework named BSLDP, which implements semantic extraction of source code files through a bidirectional long and short-term memory network with self-attention mechanism. In particular, we use the proposed semantic extractor named ASL to extract source code semantics based on source code files, and then we employ the proposed classification algorithm fed the semantic information of source project and target project, namely BSL, to build a prediction model. Furthermore, we propose an equal meshing mechanism that ASL generates semantic information on small fragments by dividing the numerical token vector to further improve the performance of the proposed model. We evaluated the performance of the proposed model on a publicly available PROMISE dataset. Compared with the four state-of-the-art methods, the experimental results indicate that on average BSLDP improves the performance of cross-project defect prediction in terms of F1 by 14.2%, 34.6%, 32.2% and 23.6%, respectively.

Full Text