Feature selection with SVD entropy: Some modification and extension

Monami Banerjee,Nikhil R Pal

doi:10.1016/j.ins.2013.12.029

Abstract

Many approaches have been developed for dimensionality reduction. These approaches can broadly be categorized into supervised and unsupervised methods. In case of supervised dimensionality reduction, for any input vector the target value is known, which can be a class label also. In a supervised approach, our objective is to select a subset of features that has adequate discriminating power to predict the target value. This target value for an input vector is absent in case of an unsupervised approach. In an unsupervised scheme, we mainly try to find a subset that can capture the inherent “structure” of the data, such as the neighborhood relation or the cluster structure. In this work, we first study a Singular Value Decomposition (SVD) based unsupervised feature selection approach proposed by Varshavsky et al. Then we propose a modification of this method to improve its performance. An SVD-entropy based supervised feature selection algorithm is also developed in this paper. Performance evaluation of the algorithms is done on altogether 13 benchmark and one Synthetic data sets. The quality of the selected features is assessed using three indices: Sammon’s Error (SE), Cluster Preservation Index (CPI) and MisClassification Error (MCE) using a 1-Nearest Neighbor (1-NN) classifier. Besides showing the improvement of the modified unsupervised scheme over the existing one, we have also made a comparative study of the modified unsupervised and the proposed supervised algorithms with one well-known unsupervised and two popular supervised feature selection methods respectively. Our results reveal the effectiveness of the proposed algorithms in selecting relevant features.

Full Text