Feature selection can greatly enhance the performance of a learning algorithm when dealing with a high dimensional data set. The filter method and the wrapper method are the two most commonly approaches. However, these approaches have limitations. The filter method uses independent evaluation to evaluate and select features, which is computationally efficient but less accurate than the wrapper method. The wrapper method uses a predetermined classifier to compute the evaluation, which can afford high accuracy for particular classifiers, but is computationally expensive. In this study, we introduce a new feature selection method that we refer to as the large margin hybrid algorithm for feature selection (LMFS). In this method, we first utilize a new distance-based evaluation function, in which ideally samples from the same class are close together, whereas samples from other classes are far apart, and a weighted bootstrapping search strategy to find a set of candidate feature subsets. Then, we use a specific classifier and cross-validation to select the final feature subset from the candidate feature subsets. Six vibrational spectroscopic data sets and three different classifiers, namely k-nearest neighbors, partial least squares discriminant analysis and least squares support vector machine were used to validate the performance of the LMFS method. The results revealed that LMFS can effectively overcome the over-fitting between the optimal feature subset and a given classifier. Compared with the filter and wrapper methods, the features selected by the LMFS method have better classification performance and model interpretation. Furthermore, LMFS can effectively overcomes the impact of classifier complexity on computational time, and distance-based classifiers were found to be more suitable for selecting the final subset in LMFS.
Read full abstract