Transfer Learning Code Vectorizer based Machine Learning Models for Software Defect Prediction

Rituraj Singh,Garima Garima,Ruchika Malhotra,Mehrab Singh Gill,Jasmeet Singh

doi:10.1109/compe49325.2020.9200076

Abstract

Software development life cycle comprises of planning, design, implementation, testing and eventually, deployment. Software defect prediction can be used in the initial stages of the development life cycle for identifying defective modules. Researchers have devised various methods that can be used for effective software defect prediction. The prediction of the presence of defects or bugs in a software module can facilitate the testing process as it would enable developers and testers to allocate their time and resources on modules that are prone to defects. Transfer learning can be used for transferring knowledge obtained from one domain into the other. In this paper, we propose Transfer Learning Code Vectorizer, a novel method that derives features from the text of the software source code itself and uses those features for defect prediction. We focus on the software code and convert it into vectors using a pre-trained deep learning language model. These code vectors are subsequently passed through machine and deep learning models. Further, we compare the results of using deep learning on the text of the software code versus the usage of software metrics for prediction of defects. In terms of weighted F1 scores, the experiments show that applying the proposed TLCV method outperforms the other machine learning techniques by 9.052%.

Full Text