Measuring Stability of Feature Selection Techniques on Real-World Software Datasets

Huanjing Wang,Taghi M Khoshgoftaar,Randall Wald

doi:10.1007/978-3-7091-1538-1_6

Abstract

In the practice of software quality estimation, superfluous software metrics often exist in data repositories. In other words, not all collected software metrics are useful or make equal contributions to software defect prediction. Selecting a subset of features that are most relevant to the class attribute is necessary and may result in better prediction. This process is called feature selection. However, the addition or removal of instances can alter the subsets chosen by a feature selection technique, rendering the previously-selected feature sets invalid. Thus, the robustness (e.g., stability) of feature selection techniques must be studied to examine the sensitivity of these techniques to changes in their input data (the addition or removal of instances). In this study, we test the stability of 18 feature selection techniques as the magnitude of change to the datasets and the size of the selected feature subsets are varied. All experiments were conducted on 16 datasets from 3 real-world software projects. The experimental results demonstrate that Gain Ratio shows the least stability while two different versions of ReliefF show the most stability, followed by the PRC- and AUC-based threshold-based feature selection techniques. Results also show that making smaller changes to the datasets has less impact on the stability of feature ranking techniques applied to those datasets.

Full Text