Hard and Soft EM in Bayesian Network Learning from Incomplete Data

Andrea Ruggieri,Marco Scutari,Francesco Stranieri,Fabio Stella

doi:10.3390/a13120329

Abstract

Incomplete data are a common feature in many domains, from clinical trials to industrial applications. Bayesian networks (BNs) are often used in these domains because of their graphical and causal interpretations. BN parameter learning from incomplete data is usually implemented with the Expectation-Maximisation algorithm (EM), which computes the relevant sufficient statistics (“soft EM”) using belief propagation. Similarly, the Structural Expectation-Maximisation algorithm (Structural EM) learns the network structure of the BN from those sufficient statistics using algorithms designed for complete data. However, practical implementations of parameter and structure learning often impute missing data (“hard EM”) to compute sufficient statistics instead of using belief propagation, for both ease of implementation and computational speed. In this paper, we investigate the question: what is the impact of using imputation instead of belief propagation on the quality of the resulting BNs? From a simulation study using synthetic data and reference BNs, we find that it is possible to recommend one approach over the other in several scenarios based on the characteristics of the data. We then use this information to build a simple decision tree to guide practitioners in choosing the EM algorithm best suited to their problem.

Highlights

The performance of machine learning models is highly dependent on the quality of the data that are available to train them: the more information they contain, the better the insights we can obtain from them
We investigate the impact of learning the parameters and the structure of a Bayesian networks (BNs) using hard EM instead of soft EM with a comprehensive simulation study covering incomplete data with a wide array of different characteristics
BN parameter learning from incomplete data is typically performed using the EM algorithm

Summary

Introduction

The performance of machine learning models is highly dependent on the quality of the data that are available to train them: the more information they contain, the better the insights we can obtain from them. Incomplete data contain, by construction, less useful information to model the phenomenon we are studying because there are fewer complete observations from which to learn the distribution of the variables and their interplay. Modern probabilistic approaches have followed the lead of Rubin [3,4] and modelled missing values along with the stochastic mechanism of missingness. This class of approaches introduces one auxiliary variable for each experimental variable that is not completely observed in order to model the distribution of missingness; Algorithms 2020, 13, 329; doi:10.3390/a13120329 www.mdpi.com/journal/algorithms

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Algorithms	Publication Date: Dec 9, 2020
Citations: 10	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Hard and Soft EM in Bayesian Network Learning from Incomplete Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms

Lead the way for us

Similar Papers

Fault diagnosis for a solar assisted heat pump system under incomplete data and expert knowledge
Zengkai Liu ... Chao Zheng
Energy | VOL. 87
Zengkai Liu, et. al.Zengkai Liu ... Chao Zheng
27 May 2015
Energy | VOL. 87

Learning bayesian network parameters from limited data by integrating entropy and monotonicity
Zhiping Fan ... Xue Feng
Knowledge-Based Systems | VOL. 291
Zhiping Fan, et. al.Zhiping Fan ... Xue Feng
24 Feb 2024
Knowledge-Based Systems | VOL. 291

The threshold EM algorithm for parameter learning in bayesian network with incomplete data
Fradj Ben ... Karim Kalti
International Journal of Advanced Computer Science and Applications | VOL. 2
Fradj Ben, et. al.Fradj Ben ... Karim Kalti
01 Jan 2010
International Journal of Advanced Computer Science and Applications | VOL. 2

The Multimode Estimation Modeling for Flight Delay of a Busy Hub-Airport in Flight Chain
Yujie Liu ... Song Ma
-
Yujie Liu, et. al.Yujie Liu ... Song Ma
01 Jul 2009
01 Jul 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Hard and Soft EM in Bayesian Network Learning from Incomplete Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms