Abstract

Bayesian networks are powerful tools for knowledge representation and inference under conditions of uncertainty. However, learning an optimal Bayesian network classifier (BNC) is an NP-hard problem since its topology complexity increases exponentially with the number of attributes. Researchers proposed to apply information-theoretic criteria to measure conditional dependence, and independence assumptions are introduced implicitly or explicitly to simplify the network topology of BNC. In this paper, we clarify the mapping relationship between conditional mutual information and local topology, and then illustrate that informational independence does not correspond to probabilistic independence, the criterion of probabilistic independence does not necessarily hold for the independence topology. A novel framework of semi-naive Bayesian operation, called Hierarchical Independence Thresholding (HIT), is presented to efficiently identify informational conditional independence and probabilistic conditional independence by applying an adaptive thresholding method, redundant edges will be filtered out and the learned topology will fit the data better. Extensive experimental evaluation on 58 publicly available datasets reveals that when HIT is applied to BNCs (such as tree augmented Naive Bayes or k-dependence Bayesian classifier), the final BNCs achieve competitive classification performance compared to state-of-the-art learners such as Random Forest and Logistic regression.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call