Online Feature Selection Using Sparse Gradient

N.N Banu,R.S Kumar

doi:10.1142/s0218213022500385

Abstract

Feature Selection (FS) is an important preprocessing step in data analytics. It is used to select a subset of the original feature set such that the selected subset does not affect the classification performance significantly. Its objective is to remove irrelevant and redundant features from the original dataset. FS can be done either in offline mode or in online mode. The basic assumption in the former mode is that the entire dataset has been available for the FS algorithm; and the FS algorithm takes multiple epochs to select optimal feature subset that gives good accuracy. In contrast, the FS algorithms in online mode take input data one instance at a time and accumulate knowledge by learning each one of them. In online mode each instance of the original dataset is considered as training and testing sample as well. The offline FS algorithms require long time periods, if the data to be processed is large such as Big data. Whereas online FS algorithms will take only one epoch to learn the entire data and can produce the results swiftly which is highly desirable in the case of Big data. This paper deals with the online FS problem and provides a novel Feature Selection algorithm which uses the Sparse Gradient method to build a sparse classifier. In this proposed method, an online classifier is built and maintained throughout the learning process and feature weights, which are limited to a particular boundary limit, are reduced in a step by step decrement process. This method creates sparsity in the classifier. Effectively, the built classifier is used to select optimal feature subset from the incoming data. As this method reduces the weights in the classifier in step by step manner, only those important features which have value higher than the boundary survive from this repeated decrement process. The resultant optimal feature subset is formed using these non-zero weighted features. Most significantly, this particular method can be used with any learning algorithm. To show its applicability with different learning algorithms, various online feature selection models have been built using Learning Vector Quantization, Radial Basis Function Networks and Adaptive Resonance Theory MAP. In all these models, the proposed Sparse Gradient method is used. The encouraging results shows the effectiveness of the proposed method with different learning algorithms in medium and large sized benchmark datasets.

Full Text