Stability and Accuracy of Feature Selection Methods on Datasets of Varying Data Complexity

Omaimah Al Hosni,Andrew Starkey

doi:10.1109/acit53391.2021.9677329

Abstract

One widespread criterion used to evaluate feature selection techniques is the classifier performance of the selected features. Another criterion that has recently drawn attention in the feature selection community is the stability of feature selection techniques. Our study indicates that using feature selection techniques with different data characteristics may generate different subsets of features under variations to the training data. Our study motivation is that there are significant contributions in the research community from examining the effect of complex data characteristics such as class overlap on classification algorithms performance; however, relatively few studies have investigated the stability and the accuracy of feature selection methods with complex data characteristics. Accordingly, this study aims to conduct empirical study to measure the interactive effects of the class overlap with different data characteristics so we will provide meaningful insights into the root causes for feature selection methods misdiagnosing the relevant features among different data challenges associated with real world data in which will guide the practitioners and researchers to choose the correct feature selection methods that are more appropriate for particular dataset. Also, in this study we will provide a survey on the current state of research in the feature selection stability context.

Full Text