Abstract

In this paper, we study the problem of Markov boundary (MB) learning with streaming data. A MB is a crucial concept in a Bayesian network (BN) and plays an important role in BN structure learning. In addition, in the supervised learning setting, the MB of a class attribute is optimal to feature selection for classification. Almost all existing MB learning algorithms focus on static data, but few efforts have been proposed to learning MBs with streaming data. In this paper, by linking dynamic AD-trees with streaming data, we proposed a new SDMB (streaming data-based MB) algorithm for learning MBs with streaming data. Specifically, given a target variable, SDMB employs a dynamic AD-tree to summarize the historical data, then the SDMB sequentially learns the MB of the target upon all available data by calculating independence tests using the dynamic AD-tree. In experiments, using the synthetic and real-world data sets, we evaluate the SDMB algorithm and compared it with the state-of-the-art online feature selection algorithms and data stream mining methods, and the experimental results validate the SDMB algorithm.

Highlights

  • The notion of a Markov boundary (MB) was coined by Pearl in a Bayesian network (BN) [21]

  • We can use the MB learning algorithms to find the MB of each variable in a data set for constructing a skeleton of a BN structure to reduce the BN structure search spaces, orient edges in the skeleton

  • Based on the dynamic all-dimensions tree (AD-tree) and the MakeCT algorithm, we propose a new SDMB algorithm to learn the MB of a given target variable in data streams

Read more

Summary

INTRODUCTION

The notion of a Markov boundary (MB) was coined by Pearl in a Bayesian network (BN) [21]. Based on the dynamic AD-tree and the MakeCT algorithm, we propose a new SDMB (streaming data-based MB) algorithm to learn the MB of a given target variable in data streams. As new data samples come, by linking the dynamic AD-tree with streaming data, SDMB first summarizes these new streaming data samples into the dynamic AD-tree, by linking the updated dynamic AD-tree with contingency tables, SDMB calculates independence tests for learning the MB of a given target variable using the MakeCT algorithm. In the experiments, we have validated SDMB for standard MB learning with synthetic streaming data generated from benchmark Bayesian networks and for feature selection for classification with several state-of-the-art online feature selection algorithms and data stream classification methods (without feature selection) using synthetic and real-world data sets.

RELATED WORK
PROPOSED SDMB ALGORITHM
3: Input a block of streaming data Di
UPDATE MB
COMPLEXITY ANALYSIS
TRACING SDMB
EXPERIMENTS
Findings
CONCLUSION AND ONGOING WORK
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.