Abstract
Bayesian network classifiers (BNCs) have demonstrated competitive classification performance in a variety of real-world applications. A highly scalable BNC with high expressivity is extremely desirable. This paper proposes Redundant Dependence Elimination (RDE) for improving the classification performance and expressivity of k-dependence Bayesian classifier (KDB). To demonstrate the unique characteristics of each case, RDE identifies redundant conditional dependencies and then substitute/remove them. The learned personalized k-dependence Bayesian Classifier (PKDB) can achieve high-confidence conditional probabilities, and graphically interpret the dependency relationships between attributes. Two thyroid cancer datasets and four other cancer datasets from the UCI machine learning repository are selected for our experimental study. The experimental results prove the effectiveness of the proposed algorithm in terms of zero-one loss, bias, variance and AUC.
Highlights
Data mining is the analysis step of the “knowledge discovery in databases” process and its goal is the extraction of patterns and knowledge from large amounts of data
By computing local mutual information (LMI), conditional local mutual information (CLMI) from the local perspective, KDBO, which learns from individual testing instance, is obviously an example of learners for precision medicine
Because of the computational overhead, only a limited number of dependencies, which are determined by parameter k, can be described by KDBP
Summary
Data mining is the analysis step of the “knowledge discovery in databases” process and its goal is the extraction of patterns and knowledge from large amounts of data. Due to the high prediction performance of these statistical models, researchers would like to gain an understanding of the reasons behind such a prediction, especially when the prediction contradicts their intuition. Physicians are typically interested in the final prediction, and like to understand the underlying inference procedure that may help explain why the system makes a certain recommendation. Causal and graphical model is more desirable to visualize and mine previously undiscovered knowledge from data [1]. Bayesian network classifiers (BNCs) have long been a popular tool for graphically representing the probabilistic dependencies and inferring under conditions of uncertainty [2,3,4,5].
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have