Abstract

Over recent decades, the rapid growth in data makes ever more urgent the quest for highly scalable Bayesian networks that have better classification performance and expressivity (that is, capacity to respectively describe dependence relationships between attributes in different situations). To reduce the search space of possible attribute orders, k-dependence Bayesian classifier (KDB) simply applies mutual information to sort attributes. This sorting strategy is very efficient but it neglects the conditional dependencies between attributes and is sub-optimal. In this paper, we propose a novel sorting strategy and extend KDB from a single restricted network to unrestricted ensemble networks, i.e., unrestricted Bayesian classifier (UKDB), in terms of Markov blanket analysis and target learning. Target learning is a framework that takes each unlabeled testing instance as a target and builds a specific Bayesian model Bayesian network classifiers (BNC) to complement BNC learned from training data . UKDB respectively introduced UKDB and UKDB to flexibly describe the change in dependence relationships for different testing instances and the robust dependence relationships implicated in training data. They both use UKDB as the base classifier by applying the same learning strategy while modeling different parts of the data space, thus they are complementary in nature. The extensive experimental results on the Wisconsin breast cancer database for case study and other 10 datasets by involving classifiers with different structure complexities, such as Naive Bayes (0-dependence), Tree augmented Naive Bayes (1-dependence) and KDB (arbitrary k-dependence), prove the effectiveness and robustness of the proposed approach.

Highlights

  • Since 1995, researchers have proposed to embed machine-learning techniques into a computer-aided system, such as medical diagnosis system [1,2,3,4]

  • For testing data, UKDBP proposes a natural way for dealing with missing values, not considering the dependence relationships related to missing values

  • Since the efficiency of the unrestricted k-dependence Bayesian classifier (UKDB) depends on the efficiency of MI and CMI, we use another criterion, pointwise mutual information (PMI) and pointwise conditional mutual information (PCMI) to compare and to show in which situations MI and CMI is more efficient

Read more

Summary

Introduction

Since 1995, researchers have proposed to embed machine-learning techniques into a computer-aided system, such as medical diagnosis system [1,2,3,4]. When P( xi , x j |c) < P( xi |c) ∗ P( x j |c) or log( P( xi , x j |c)/( P( xi |c) ∗ P( x j |c)) < 0, I ( xi ; x j |c) < 0 holds and we argue that the relationship between attribute values xi and x j can be considered to be conditional independence. There exist some negative values of I ( x1 ; x2 |c) that represent conditional independence, i.e., the dependence relationship may be different rather than invariant when attributes take different values. General BNCs (like NB, TAN and KDB), which only build one model to fit training instances, cannot capture this difference and cannot represent the dependence relationships flexibly. Taheri et al [23] proposed to build a dynamic structure without specifying k a priori, and they proved that the resulting BNC is optimal

The UKDB Algorithm
Results and Discussion
Evaluation Function
Experimental Study on WBC Dataset
The Effect of Values of k
The Effect of Missing Values
Results without Missing Values
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call