The Comparison of Feature Selection Methods in Software Defect Prediction

Khadijah Khadijah,Panji Wisnu Wirawan,Amazona Adorada,Kabul Kurniawan

doi:10.1109/icicos51170.2020.9299022

Abstract

One of the goal in software testing is to discover software defects before the software is used by customer. Successful software testing leads to high quality software. However, exposing a defect in software testing is very resources consuming. Therefore, an automated software defect prediction is needed. In order to build accurate model for prediction, a relevant subset of features must be carefully determined as an input to the classifier. Therefore, this research compares the performance of feature selection method between a kind of filter method, namely ReliefF and a kind of embedded method, namely SVM-RFE (Support Vector Machine – Recursive Feature Elimination). Those methods are free from the assumption of conditional independence among features. Then, SVM is applied as classification algorithm. Previously, SMOTE (Synthetic Minority Oversampling Technique) is used to balance the training data. The experiments are run on benchmark public dataset, NASA MDP dataset. The experiment results show that SVM-RFE perform better than ReliefF in term of g-mean, while ReliefF perform better SVM-RFE in term of accuracy. However, when using SVM-RFE feature selection, the best classifier performance can be achieved with smaller number of features as compared to ReliefF. Future research may explore ensemble feature selection method as an attempt to improve performance of the resulting classifier, both in g-mean and accuracy.

Full Text