Survey on feature subset selection for high dimensional data

A H Shahana,V Preeja

doi:10.1109/iccpct.2016.7530147

Abstract

Nowadays high dimensional data plays an important role in many scientific and research applications. A high dimensional data consists of several features or attributes. These data may contain redundant and irrelevant features. In order to reduce the dimensionality of data, these unwanted and redundant features need to be removed. Feature selection techniques are used to identify the redundant and irrelevant features from the original set of data. Feature selection identifies most representative features from a collection of features. This survey explains a novel clustering based feature subset selection algorithm for high dimensional data, FAST. The algorithm involves removal of irrelevant features from a collection of data, construction of minimum spanning tree followed by tree partitioning and finally selecting subset of features.

Full Text