Abstract

The Bayesian classification framework has been widely used in many fields, but the covariance matrix is usually difficult to estimate reliably. To alleviate the problem, many naive Bayes (NB) approaches with good performance have been developed. However, the assumption of conditional independence between attributes in NB rarely holds in reality. Various attribute-weighting schemes have been developed to address this problem. Among them, class-specific attribute weighted naive Bayes (CAWNB) has recently achieved good performance by using classification feedback to optimize the attribute weights of each class. However, the derived model may be over-fitted to the training dataset, especially when the dataset is insufficient to train a model with good generalization performance. This paper proposes a regularization technique to improve the generalization capability of CAWNB, which could well balance the trade-off between discrimination power and generalization capability. More specifically, by introducing the regularization term, the proposed method, namely regularized naive Bayes (RNB), could well capture the data characteristics when the dataset is large, and exhibit good generalization performance when the dataset is small. RNB is compared with the state-of-the-art naive Bayes methods. Experiments on 33 machine-learning benchmark datasets demonstrate that RNB outperforms the compared methods significantly.

Highlights

  • The Bayesian classification framework is fundamental to statistical pattern recognition and widely deployed in many machine-learning tasks [1]–[6]

  • Compared with the previous best algorithm, class-specific attribute weighted naive Bayes (CAWNB), the proposed regularized naive Bayes (RNB) achieves more than 1% of improvement for the average classification accuracy over the 33 datasets

  • After a thorough literature review of the stateof-the-art attribute-weighting naive Bayes methods, we find that class-dependent attribute-weighting naive Bayes has poor generalization capabilities on relatively small datasets

Read more

Summary

INTRODUCTION

The Bayesian classification framework is fundamental to statistical pattern recognition and widely deployed in many machine-learning tasks [1]–[6]. It will improve the generalization capability of CAWNB by integrating with the simpler model WANBIA It will not significantly increase the computational complexity by integrating these two models, as both share similar procedures to solve the optimization problem [34], [35]. When the dataset is small and a simpler model is preferred, α will be smaller and a larger weight will be assigned to PI (w), which will ensure better generalization capabilities.

RELATED WORKS
PROBLEM ANALYSIS OF PREVIOUS NAIVE BAYES METHODS
OVERVIEW OF PROPOSED REGULARIZED NAIVE BAYES
ESTIMATION OF PRIOR PROBABILITIES AND LIKELIHOOD PROBABILITIES
EXPERIMENTAL RESULTS
EXPERIMENTAL SETTINGS
COMPARISON TO STATE OF THE ART
ANALYSIS OF EXPERIMENTAL RESULTS
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call