Study on the Influence of the Number of Features on the Performance of Software Defect Prediction Model

Mengtian Cui,Yue Sun,Yue Jiang,Yang Lu

doi:10.1145/3342999.3343010

Abstract

The software defect prediction model based on machine learning technology is the key to improve the reliability of software. The influence of the number of features on the performance of different software defect prediction models was proposed in this paper. First, a new data sets was built, which is increasing by the number of features based on the NASA public data sets. Then, the eight predictive models are experimented based on these data sets. Next, the influence of the number of features on the performance of different prediction models was analyzed based on the experimental results. Next, the AUC values obtained from the experiment were used to evaluate the performance of different prediction models, and the coefficient of variation C·V values was used to evaluate the performance stability of different prediction models while the number of features changed. In the end, the experiments show that the performance of the predictive model C4.5 is highly susceptible to changes in the number of features, while the performance of the predictive model SMO is relatively stable.

Full Text