On the effect of data reduction on classification accuracy

Syrine Ben Meskina

doi:10.1109/icites.2013.6624071

Abstract

Data reduction is an important pre-processing step to both supervised and unsupervised machine learning problems. In this paper, we investigate, in a first part, the two existing strategies for data reduction which are feature selection (FS) and dimensionality reduction (DR). In a second part, we study the impact of different data reduction methods on supervised machine learning in terms of classification accuracy and computational costs. In fact, we compare, in the one hand, the generated subsets of attributes by filter and wrapper algorithms as well as new variables constructed by two variants of a DR method. In the other hand, we compare the classification achieved on initial data set, reduced data sets and also on successively reduced size of the considered data sets.

Full Text