Abstract

Online streaming feature selection (OSFS) algorithms, producing an approximately optimal subset from so-far-seen features in real time, are capable of addressing feature selection issues in extreme large or even infinite dimensional space. There are several algorithms proposed carrying out in OSFS way. However, some of these algorithms need prior knowledge about the entire feature space which is inaccessible in real OSFS scenario. Besides, results of them are sensitive to the permutations of features. In this paper, we first propose an OSFS framework based on the uncertainty measures in rough sets theory. The framework needs no additional information, except for the given data. Moreover, a sorting mechanism is adopted in the framework, as creates its stability to varying the order of features. Then, specifying the uncertainty measure with conditional information entropy (CIE), we design an algorithm named CIE-OSFS based on the framework. Comprehensive experiments are conducted to verify the effectiveness of our method on several high dimensional benchmark data sets. The experimental results indicate that CIE-OSFS achieves more compactness with the prerequisite of guaranteeing the predictive accuracy and performs more stably to the changing of features' order than other algorithms in most cases.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call