Online Streaming Features Selection via Markov Blanket

Waqar Khan,Ling Wang,Lingfu Kong,Huigui Yan,Brekhna Brekhna

doi:10.3390/sym14010149

Abstract

Streaming feature selection has always been an excellent method for selecting the relevant subset of features from high-dimensional data and overcoming learning complexity. However, little attention is paid to online feature selection through the Markov Blanket (MB). Several studies based on traditional MB learning presented low prediction accuracy and used fewer datasets as the number of conditional independence tests is high and consumes more time. This paper presents a novel algorithm called Online Feature Selection Via Markov Blanket (OFSVMB) based on a statistical conditional independence test offering high accuracy and less computation time. It reduces the number of conditional independence tests and incorporates the online relevance and redundant analysis to check the relevancy between the upcoming feature and target variable T, discard the redundant features from Parents-Child (PC) and Spouses (SP) online, and find PC and SP simultaneously. The performance OFSVMB is compared with traditional MB learning algorithms including IAMB, STMB, HITON-MB, BAMB, and EEMB, and Streaming feature selection algorithms including OSFS, Alpha-investing, and SAOLA on 9 benchmark Bayesian Network (BN) datasets and 14 real-world datasets. For the performance evaluation, F1, precision, and recall measures are used with a significant level of 0.01 and 0.05 on benchmark BN and real-world datasets, including 12 classifiers keeping a significant level of 0.01. On benchmark BN datasets with 500 and 5000 sample sizes, OFSVMB achieved significant accuracy than IAMB, STMB, HITON-MB, BAMB, and EEMB in terms of F1, precision, recall, and running faster. It finds more accurate MB regardless of the size of the features set. In contrast, OFSVMB offers substantial improvements based on mean prediction accuracy regarding 12 classifiers with small and large sample sizes on real-world datasets than OSFS, Alpha-investing, and SAOLA but slower than OSFS, Alpha-investing, and SAOLA because these algorithms only find the PC set but not SP. Furthermore, the sensitivity analysis shows that OFSVMB is more accurate in selecting the optimal features.

Highlights

In machine learning, several feature selection algorithms are essential for processing high-dimensional data
The results are conducted through extensive experiments and comparing them with the traditionalbased Markov blanket (MB) discovery algorithms such as Iterative Associative Markov Blanket (IAMB), Simultaneous MB (STMB), HITON-MB (HITON-MB), Balanced Markov Blanket (BAMB), an Efficient and Effective MB discovery (EEMB), and streaming-based algorithms such as Alpha-investing (α-investing), Scalable and Accurate Online Feature Selection (SAOLA), and Online Streaming Feature Selection (OSFS)
The real-world datasets are selected from different domains, such as sets from the UCI machine learning repository [27]; frequently studied public microarray [28], ionoshpere, colon, arcene, leukemia, and madelon are from the NIPS 2003 feature selection competition [29]; lung and medical belongs to biomedical [30]; lymphoma, reged1, and marti1 [31,32]; and prostate-GE and sido0 [33,34]

Summary

Introduction

Several feature selection algorithms are essential for processing high-dimensional data. Several algorithms based on streaming features (SF) were proposed for real scenarios including Grafting [8], Alpha-investing (α− investing) [13], Scalable and Accurate Online Feature Selection (SAOLA) [14], and Online Streaming Feature Selection (OSFS) [15] These algorithms only focus on obtaining PC sets and do not consider the Spouses, which causes them to lose the interpretability by ignoring the causal MB discovery. Motivated by these observations and issues, this paper presents an Online Streaming Features Selection via Markov Blanket algorithm, based on a statistical conditional independence test.

Related Work

Preliminaries

Framework of OFSVMB

Initialization

Output

The Proposed OFSVMB Algorithm and Analysis

48: Output MBT

Statistical Conditional Independence Terminology in OFSVMB

Statistical G2 Test for Discrete Data

Statistical Fisher’s z-Test for Continuous Data

Correctness of OFSVMB

Time Complexity Analysis

Results and Discussion

Datasets and Experiment Setup

Evaluation Metrics

Results and Discussion on Benchmark BN

Evaluation Classifiers

C-9 C-10 C-11 C-12

Sensitivity Analysis

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Symmetry	Publication Date: Jan 13, 2022
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Online Streaming Features Selection via Markov Blanket

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Symmetry

Lead the way for us

Similar Papers

Online Causal Feature Selection for Streaming Features
Xinju Ou ... Shunpan Liang
IEEE Transactions on Neural Networks and Learning Systems | VOL. 34
Xinju Ou, et. al.Xinju Ou ... Shunpan Liang
01 Mar 2023
IEEE Transactions on Neural Networks and Learning Systems | VOL. 34

Learning Markov Blankets From Multiple Interventional Data Sets.
Jiuyong Li ... Kui Yu
IEEE transactions on neural networks and learning systems | VOL. 31
Jiuyong Li, et. al.Jiuyong Li ... Kui Yu
28 Aug 2019
IEEE transactions on neural networks and learning systems | VOL. 31

Loose-to-strict Markov blanket learning algorithm for feature selection
Liyue Zhang ... Haoran Liu
Knowledge-Based Systems | VOL. 283
Liyue Zhang, et. al.Liyue Zhang ... Haoran Liu
17 Nov 2023
Knowledge-Based Systems | VOL. 283

Causal Feature Selection with Missing Data
Wei Ding ... Yajing Yang
ACM Transactions on Knowledge Discovery from Data | VOL. 16
Wei Ding, et. al.Wei Ding ... Yajing Yang
08 Jan 2022
ACM Transactions on Knowledge Discovery from Data | VOL. 16

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Online Streaming Features Selection via Markov Blanket

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Symmetry