Learning Bayesian Networks: The Combination of Scoring Function and Dataset

Saratha Sathasivam,Poh Choo Song,Jing Jie Yeap

doi:10.35940/ijeat.d7568.069520

Saratha Sathasivam, Poh Choo Song + Show 1 more

Open Access

https://doi.org/10.35940/ijeat.d7568.069520

Copy DOI

Abstract

Bayesian network (BN), a graphical model consists nodes and directed edges, which representing random variables and relationship of the corresponding random variables, respectively. The main study of Bayesian network is structural learning and parameter learning. There are score-and-search based, constraint based and hybrid based in forming the network structure. However, there are many types of scores and algorithms available in the structural learning of Bayesian network. Hence, the objective of this study is to determine the best combination of scores and algorithms for various types of datasets. Besides, the convergence of time in forming the BN structure with datasets of different sizes has been examined. Lastly, a comparison between score-and-search based and constraint based methods is made in this study. At the end of this study, it has been observed that Tabu search has the best combination with the scoring function regardless of the size of dataset. Furthermore, it has been found that when the dataset is large, the time it takes for a BN structure to converge is shorter. Last but not least, results showed that the score-and-search based algorithm performs better as compared to constraint based algorithm.

Highlights

Bayesian network (BN) comprises a set of random variables and its directed arcs which representing the conditional dependencies between nodes
In an effort to determine the best combination of algorithms and scores, this study examines six different data, three of each of small dataset category and big dataset category
Pairing with all the scoring functions to work on Asia (8 variables) dataset, HC and Tabu topped all other algorithms with the best score

Summary

Introduction

Bayesian network (BN) comprises a set of random variables and its directed arcs which representing the conditional dependencies between nodes. It is a directed acyclic graph (DAG). There are numerous number of algorithms and scores arises, the question arises as to which algorithm is the most appropriate one to be used when handling with datasets of different sizes. It is not efficient though if researchers were to Revised Manuscript Received on May 15, 2020. It is not efficient though if researchers were to Revised Manuscript Received on May 15, 2020. * Correspondence Author

Objectives

Methods

Results

Conclusion