PEnBayes: A Multi-Layered Ensemble Approach for Learning Bayesian Network Structure from Big Data.

Yan Tang,Mai Nguyen,Ilkay Altintas,Jianwu Wang

doi:10.3390/s19204400

Abstract

Discovering the Bayesian network (BN) structure from big datasets containing rich causal relationships is becoming increasingly valuable for modeling and reasoning under uncertainties in many areas with big data gathered from sensors due to high volume and fast veracity. Most of the current BN structure learning algorithms have shortcomings facing big data. First, learning a BN structure from the entire big dataset is an expensive task which often ends in failure due to memory constraints. Second, it is quite difficult to select a learner from numerous BN structure learning algorithms to consistently achieve good learning accuracy. Lastly, there is a lack of an intelligent method that merges separately learned BN structures into a well structured BN network. To address these shortcomings, we introduce a novel parallel learning approach called PEnBayes (Parallel Ensemble-based Bayesian network learning). PEnBayes starts with an adaptive data preprocessing phase that calculates the Appropriate Learning Size and intelligently divides a big dataset for fast distributed local structure learning. Then, PEnBayes learns a collection of local BN Structures in parallel using a two-layered weighted adjacent matrix-based structure ensemble method. Lastly, PEnBayes merges the local BN Structures into a global network structure using the structure ensemble method at the global layer. For the experiment, we generate big data sets by simulating sensor data from patient monitoring, transportation, and disease diagnosis domains. The Experimental results show that PEnBayes achieves a significantly improved execution performance with more consistent and stable results compared with three baseline learning algorithms.

Highlights

A Bayesian network (BN) BBN is a probabilistic graphical model that represents a probability distribution through a directed acyclic graph (DAG) that encodes conditional dependency and independency relationships among variables in the model
We significantly extend our previous work [8,40] in adopting different BN structure learning algorithms in the Local Learner and design a three-layered ensemble approach to ensure learning stability and accuracy
Our goal is to use the big training data to learn an accurate model of the underlying distribution at both data level and algorithm level to achieve better learning accuracy, stability, and usability towards integrating Bayesian network learning as part of the big data modeling and scientific workflow engine

Summary

Introduction

A Bayesian network (BN) BBN is a probabilistic graphical model that represents a probability distribution through a directed acyclic graph (DAG) that encodes conditional dependency and independency relationships among variables in the model. One solution is to perform the learning task in a distributed data processing. Distributed Data-Parallel Patterns and Supporting Systems for Scalable Big Data Application. DDP patterns enable programs to execute in parallel by splitting data in distributed computing environments. DDP pattern executes user-defined functions (UDF) in parallel over input datasets. Users only need to select the appropriate DDP pattern for their specific data processing tasks, and implement the corresponding UDFs. Due to the increasing popularity and adoption of these DDP patterns, a number of execution engines have been implemented to support one or more of them. Due to the increasing popularity and adoption of these DDP patterns, a number of execution engines have been implemented to support one or more of them These DDP execution engines manage distributed resources, and execute UDF instances in parallel. When running on distributed resources, DDP engines can achieve good scalability and performance acceleration with good fault tolerance

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Sensors (Basel, Switzerland)	Publication Date: Oct 11, 2019
Citations: 5	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

PEnBayes: A Multi-Layered Ensemble Approach for Learning Bayesian Network Structure from Big Data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors (Basel, Switzerland)

Lead the way for us

Similar Papers

Adaptive Bayesian Network Structure Learning from Big Datasets
Yan Tang ... Huaxin Liu
-
Yan Tang, et. al.Yan Tang ... Huaxin Liu
01 Jan 2017
01 Jan 2017

PSL: An Algorithm for Partial Bayesian Network Structure Learning
Zhaolong Ling ... Yiwen Zhang
ACM Transactions on Knowledge Discovery from Data | VOL. 16
Zhaolong Ling, et. al.Zhaolong Ling ... Yiwen Zhang
09 Mar 2022
ACM Transactions on Knowledge Discovery from Data | VOL. 16

An Improved Particle Swarm Optimization Algorithm for Bayesian Network Structure Learning via Local Information Constraint
Kun Liu ... Yani Cui
IEEE Access | VOL. 9
Kun Liu, et. al.Kun Liu ... Yani Cui
01 Jan 2020
IEEE Access | VOL. 9

A Score Based Approach towards Improving Bayesian Network Structure Learning
Yan Tang ... Zhuoming Xu
-
Yan Tang, et. al.Yan Tang ... Zhuoming Xu
01 Nov 2014
01 Nov 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

PEnBayes: A Multi-Layered Ensemble Approach for Learning Bayesian Network Structure from Big Data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors (Basel, Switzerland)