A recommendation system of training data selection method for cross-project defect prediction

Benyamin Langgu Sinaga,Intan Ermahani A Jalil,Zuraida Abal Abas,Sabrina Ahmad

doi:10.11591/ijeecs.v27.i2.pp990-1006

Abstract

Cross-project <span lang="EN-US">defect prediction (CPDP) has been a popular approach to address the limited historical dataset when building a defect prediction model. Directly applying cross-project datasets to learn the prediction model produces an unsatisfactory predictive model. Therefore, the selection of training data is essential. Many studies have examined the effectiveness of training data selection methods, and the best-performing method varied across datasets. While no method consistently outperformed the others across all datasets, predicting the best method for a specific dataset is essential. This study proposed a recommendation system to select the most suitable training data selection method in the CPDP setting. We evaluated the proposed system using 44 datasets, 13 training data selection methods, and six classification algorithms. The findings concluded that the recommendation system effectively recommends the best method to select training data.</span>

Full Text