Abstract

The problem of feature subset selection can be defined as the selection of a relevant subset of features which allows a learning algorithm to induce small high-accuracy models. This problem is of primary important because irrelevant and redundant features may degrade the learner speed, especially in the context of high dimensionality, and reduce both the accuracy and comprehensibility of the induced model. Two main approaches have been developed, the first one is algorithm-independent (filter approach) which considers only the data, when the second approach which is algorithm-dependent takes into account both the data and a given learning algorithm (wrapper approach). Recent work was developed to study the interest of the rough set theory and more particularly its notions of reducts and core to deal with the problem of feature subset selection. Different methods were proposed to select features using both the core and the reduct concepts, whereas other researches show that useful feature subsets do not necessarily contain all features in cores. In this paper, we underline the fact that rough set theory is concerned with deterministic analysis of attribute dependencies which are at the basis of the two notions of reduct and core. We extend the notion of dependency which allows to find both deterministic and non-deterministic dependencies. A new notion of strong reducts is then introduced and leads to the definition of strong feature subsets (SFS). The interest of SFS is illustrated by the improvement of the accuracy of C4.5 on real-world datasets. Our study shows that generally the highest-accuracy-subset is not the best one as regards to the filter criteria. The highest accuracy subset is found by the new approach with minimum cost. The contribution of this work is four folds : (1) analysis of feature subset selection in the rough sets context, (2) introduction of new definitions based on a generalized rough set theory, i.e., \alpha-RST, (3) reformulation of the selection problem, (4) description of a hybrid method combining combining both the filter and the wrapper approaches.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call