Empirical Validation of cross-version and 10-fold cross-validation for Defect Prediction

Ruchika Malhotra,Shweta Meena

doi:10.1109/icesc51422.2021.9533030

Abstract

Nowadays, the development of a prediction model is one of the most important research area. Prediction models are helpful in providing accurate results for unseen data or future data. The most important phase of the software development life cycle (SDLC) is testing. During testing, we face some uncertain behavior of the program due to the presence of defects. To remove these defects in the early stage of SDLC, we design Software Defect Prediction Model (SDPM). Although many SDPM have been developed using various Machine Learning (ML) and statistical technique, but the generalizability of results have prevailed because the developed models use the same dataset for training and testing. This study aims to develop SDPM using cross-version, which using two different versions of a project for training and testing. The historical labeled defect data of the previous version is used for the updated or upcoming version for defect prediction which is termed as Cross-Version Defect Prediction (CVDP). To complete experimentation, we have used 26 datasets from an open-source repository. The performance of SDPM is analyzed using performance metrics. The SDPM is also developed using 10-cross validation. In the end, the comparison of CVDP and 10-cross validation has been done using a statistical test. The aim of conducting this study is to analyze the applicability of cross-version defect prediction when a sufficient amount of data is not available for training and testing. According to statistical analysis, it has been observed that cross-version can be used if we have to test our prediction model for unseen projects or upcoming projects.

Full Text