Handling Sparse Data Sets by Applying Contrast Set Mining in Feature Selection

Dijana Oreški,Mario Konecki

doi:10.17706/jsw.11.2.148-161

Abstract

A data set is sparse if the number of samples in a data set is not sufficient to model the data accurately. Recent research emphasized interest in applying data mining and feature selection techniques to real world problems, many of which are characterized as sparse data sets. The purpose of this research is to define new techniques for feature selection in order to improve classification accuracy and reduce the time required for feature selection on sparse data sets. The extensive comparison with benchmarking feature selection techniques on 64 sparse data sets was conducted. Results have shown superiority of contrast set mining techniques in more than 80% of the analysis on sparse data sets. This paper provides a study on the new methodologies and detected superiority in handling data sparsity.

Full Text