Stability and Classification Performance of Feature Selection Techniques

Huanjing Wang Huanjing Wang,Qianhui Liang Qianhui Liang,T M Khoshgoftaaar

doi:10.1109/icmla.2011.133

Abstract

Feature selection techniques can be evaluated based on either model performance or the stability (robustness) of the technique. The ideal situation is to choose a feature selection technique that is robust to change, while also ensuring that models built with the selected features perform well. One domain where feature selection is especially important is software defect prediction, where large numbers of metrics collected from previous software projects are used to help engineers focus their efforts on the most faulty modules. This study presents a comprehensive empirical examination of seven filter-based feature ranking techniques (rankers) applied to nine real-world software measurement datasets of different sizes. Experimental results demonstrate that signal-to-noise ranker performed moderately in terms of robustness and was the best ranker in terms of model performance. The study also shows that although Relief was the most stable feature selection technique, it performed significantly worse than other rankers in terms of model performance.

Full Text