Learning parameters of Bayesian networks from datasets with systematically missing data: A meta–analytic approach

Jelena Kovačić

doi:10.1016/j.eswa.2019.112956

Abstract

Previous research suggested that using additional data sources could improve parameter learning in Bayesian networks. However, when additional datasets do not include all network variables, neither standard Bayesian network learning techniques nor standard missing data methods can be applied. In such situations, the use of a meta–analytic approach is proposed. The performance of one such meta–analytic approach was evaluated by simulating several study results on two real–life biomedical examples (one discrete and one Gaussian Bayesian network). Regardless of the network type, the meta–analytic approach showed higher mean log–likelihood values, less sensitive to the presence of heterogeneity, than a single dataset analysis. The difference between the two methods was most pronounced when sample sizes were small (N=100). For the meta–analytic approach, the increase in log–likelihood was in most cases positively related to the number of nodes estimated with additional data. However, as in the case of single dataset analysis, care is needed when estimating rare event probabilities from small datasets due to the problems with unidentifiability and increased bias.

Full Text