Abstract
Online feature selection is a challenging topic in data mining. It aims to reduce the dimensionality of streaming features by removing irrelevant and redundant features in real time. Existing works, such as Alpha-investing and Online Streaming Feature Selection (OSFS), have been proposed to serve this purpose, but they have drawbacks, including low prediction accuracy and high running time if the streaming features exhibit characteristics such as low redundancy and high relevance. In this paper, we propose a novel algorithm about online streaming feature selection, named ConInd that uses a three-layer filtering strategy to process streaming features with the aim of overcoming such drawbacks. Through three-layer filtering, i.e., null-conditional independence, single-conditional independence, and multi-conditional independence, we can obtain an approximate Markov blanket with high accuracy and low running time. To validate the efficiency, we implemented the proposed algorithm and tested its performance on a prevalent dataset, i.e., NIPS 2003 and Causality Workbench. Through extensive experimental results, we demonstrated that ConInd offers significant performance improvements in prediction accuracy and running time compared to Alpha-investing and OSFS. ConInd offers 5.62% higher average prediction accuracy than Alpha-investing, with a 53.56% lower average running time compared to that for OSFS when the dataset is lowly redundant and highly relevant. In addition, the ratio of the average number of features for ConInd is 242% less than that for Alpha-investing.
Highlights
Feature selection [1,2,3,4] is the most referenced method for reducing dimensions of features
The main contributions that distinguish the proposed method from existing methods are threefold: (1) we propose the use of a three-layer filtering strategy to process streaming features to filter irrelevant and redundant features, as presented in Section 3.2; (2) through three-layer filtering, we can obtain an approximate Markov blanket in low running time with high accuracy, as demonstrated in Section 4.3; and (3) we analyze the theoretical properties of the ConInd algorithm and validate its empirical performance by conducting an extensive set of experiments, as presented in Sections 4 and 5
We studied the online feature selection problem with streaming features
Summary
Feature selection [1,2,3,4] is the most referenced method for reducing dimensions of features. There are several representative research efforts on OSFSF [16], e.g., Alpha-investing, OSFS, and SAOLA, but their strategies suffer from limited prediction accuracy or running time if the streaming features possess characteristics of low redundancy and high relevance, such as in real time medical diagnosis [17] For such streaming features, many selected features would be generated. (1) we propose the use of a three-layer filtering strategy to process streaming features to filter irrelevant and redundant features, as presented in Section 3.2; (2) through three-layer filtering, we can obtain an approximate Markov blanket in low running time with high accuracy, as demonstrated in Section 4.3; and (3) we analyze the theoretical properties of the ConInd algorithm and validate its empirical performance by conducting an extensive set of experiments, as presented in Sections 4 and 5.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have