Abstract

Online feature selection is a challenging topic in data mining. It aims to reduce the dimensionality of streaming features by removing irrelevant and redundant features in real time. Existing works, such as Alpha-investing and Online Streaming Feature Selection (OSFS), have been proposed to serve this purpose, but they have drawbacks, including low prediction accuracy and high running time if the streaming features exhibit characteristics such as low redundancy and high relevance. In this paper, we propose a novel algorithm about online streaming feature selection, named ConInd that uses a three-layer filtering strategy to process streaming features with the aim of overcoming such drawbacks. Through three-layer filtering, i.e., null-conditional independence, single-conditional independence, and multi-conditional independence, we can obtain an approximate Markov blanket with high accuracy and low running time. To validate the efficiency, we implemented the proposed algorithm and tested its performance on a prevalent dataset, i.e., NIPS 2003 and Causality Workbench. Through extensive experimental results, we demonstrated that ConInd offers significant performance improvements in prediction accuracy and running time compared to Alpha-investing and OSFS. ConInd offers 5.62% higher average prediction accuracy than Alpha-investing, with a 53.56% lower average running time compared to that for OSFS when the dataset is lowly redundant and highly relevant. In addition, the ratio of the average number of features for ConInd is 242% less than that for Alpha-investing.

Highlights

  • Feature selection [1,2,3,4] is the most referenced method for reducing dimensions of features

  • The main contributions that distinguish the proposed method from existing methods are threefold: (1) we propose the use of a three-layer filtering strategy to process streaming features to filter irrelevant and redundant features, as presented in Section 3.2; (2) through three-layer filtering, we can obtain an approximate Markov blanket in low running time with high accuracy, as demonstrated in Section 4.3; and (3) we analyze the theoretical properties of the ConInd algorithm and validate its empirical performance by conducting an extensive set of experiments, as presented in Sections 4 and 5

  • We studied the online feature selection problem with streaming features

Read more

Summary

Introduction

Feature selection [1,2,3,4] is the most referenced method for reducing dimensions of features. There are several representative research efforts on OSFSF [16], e.g., Alpha-investing, OSFS, and SAOLA, but their strategies suffer from limited prediction accuracy or running time if the streaming features possess characteristics of low redundancy and high relevance, such as in real time medical diagnosis [17] For such streaming features, many selected features would be generated. (1) we propose the use of a three-layer filtering strategy to process streaming features to filter irrelevant and redundant features, as presented in Section 3.2; (2) through three-layer filtering, we can obtain an approximate Markov blanket in low running time with high accuracy, as demonstrated in Section 4.3; and (3) we analyze the theoretical properties of the ConInd algorithm and validate its empirical performance by conducting an extensive set of experiments, as presented in Sections 4 and 5.

Related Work
Framework for Streaming Features Filtering
Notation Mathematical Meanings
Definitions
Formalization of Online Feature Selection with Streaming Features
Framework for Filtering Conditional Independence
Filtering of single-conditional independence
Filtering of multi-conditional independence
Filtering of Single-Conditional Independence
Filtering
Filtering independence
Filtering of Multi-Conditional Independence
The ConInd Algorithm and Analysis
The Time Complexity of ConInd
Analysis of Approximate Markov Blankets of ConInd
Experimental Setup
Comparison of ConInd with Two Online Algorithms
Prediction Accuracy
The Number of Selected Features and Running Time
Comparison of ConInd with Two Markov Blanket Algorithms
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call