RANDOM SUBSET FEATURE SELECTION FOR CLASSIFICATION

Lakshmi Padmaja Dhyaram

doi:10.26483/ijarcs.v9i2.5496

Abstract

Feature selection has been the focus of interest in the recent past. Large data sets are collected from scientific experiments and many times features are out numbered the observations.This demands for new approaches to minimize the data set without compromising the latent knowledge.This is also called dimensionality reduction. In this paper, we have presented a detailed review of methods used in minimizing the datasets. We have selected papers which are published last 10 years in the field of dimensionality reduction using Random Subset Feature Selection(RSFS). We have concentrated mainly on random subset feature selection methods used in the dimensionality reduction. The feature subset selection methods are classified into two 4 categories-Embedded, Filter, Wrapper and Hybrid. The data mining task flow from pre-processing, feature subset selection using random forest, random subset feature selection and classification. This survey is a comprehensive overview on random subset feature selection used in various applications.

Highlights

Feature Subset Selection (FSS) is a process to select subset of relevant features, where as Random Feature Selection(RSFS) process, randomly selects the subset of relevant features from the data set to avoid the bias and over fitting selected subsets, for classifying the features from high dimensional data sets
We have concentrated mainly on random subset feature selection methods used in the dimensionality reduction
Sequential forward selection(SFS)[2] by adding the new feature with the existing feature relevancy in every iteration, initially it starts with one feature, where as back- ward selection removing the irrelevant features from the total number of features based on its relevancy

Summary

INTRODUCTION

Feature Subset Selection (FSS) is a process to select subset of relevant features, where as Random Feature Selection(RSFS) process, randomly selects the subset of relevant features from the data set to avoid the bias and over fitting selected subsets, for classifying the features from high dimensional data sets. The nesting problem is resolved with Sequential Floating Forward Selection(SFFS) by adding the relevant feature and removing the irrelevant feature from the total data set. In Embedded feature selection, features during the learning process In this process training and testing of data sets splitting is not required. It avoids re-training of a predictor for each subset gives the solution in faster. The disadvantage of traditional feature subset selection is the subset of features are fixed, but in random subset feature selection, in every iteration randomly generates new subset with new features Ho [1998] has written a number of papers on ”the random subspace” method which does a random selection of a subset of features to use to grow each tree.

Random KNN-Feature Selection(RKNN-FS)

Methods for Dimensionality Reduction on Scientific

Findings

CONCLUSION