Abstract

Software defects not only reduce operational reliability but also significantly increase overall maintenance costs. Consequently, it is necessary to predict software defects at an early stage. Existing software defect prediction studies work with artificially designed metrics or features extracted from source code by machine learning-based approaches to perform classification. However, these methods fail to make full use of the defect-related information other than code, such as comments in codes and commit messages. Therefore, in this paper, additional information extracted from natural language text is combined with the programming language codes to enrich the semantic features. A novel model based on Transformer architecture and multi-channel CNN, PM2-CNN, is proposed for software defect prediction. Pretrained language model and CNN-based classifier are utilized in the model to obtain context-sensitive representations and capture the local correlation of sequences. A large and widely used dataset is utilized to verify the effectiveness of the proposed method. The results show that the proposed method has improvements in generic evaluation metrics compared with the optimal baseline method. Accordingly, external information can have a positive impact on software defect prediction, and our model effectively incorporates such information to improve detection performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call