Redefining core preliminary concepts of classic Rough Set Theory for feature selection

Muhammad Summair Raza,Usman Qamar

doi:10.1016/j.engappai.2017.08.003

Abstract

Data is growing at an exponential pace. To cope with this data explosion, we need effective data processing and analysis techniques. Feature selection is selecting a subset of features from a dataset that still provides most of the useful information. Various tools are available as underlying framework for this process however, Rough Set Theory is the most prominent tool due to its analysis friendly nature. Majority of Rough Set based feature selection algorithms use positive region based dependency measure as the sole criteria to select feature subset. Calculating positive region requires calculation of lower approximation which consequently involves indiscernibility relation. In this paper, new definitions of two Rough Set preliminaries i.e. lower and upper rough set approximation are proposed. New definitions of approximations are computationally less expensive as compared to the conventional. The proposed redefinitions showed 42.78% decrease in execution time for redefined lower approximation and 43.06% decrease in case of redefined upper approximation, for five publicly available datasets while maintaining 100% accuracy. Finally based on these redefined approximations we proposed a feature selection algorithm, which when compared with state of the art techniques showed significant increase in performance without the affecting the accuracy.

Full Text