The Naive Bayesian classifier (NBC) is a well-known classification model that has a simple structure, low training complexity, excellent scalability, and good classification performances. However, the NBC has two key limitations: (1) it is built upon the strong assumption that condition attributes are independent, which often does not hold in real-life, and (2) the NBC does not handle continuous attributes well. To overcome these limitations, this paper presents a novel approach for NBC construction, called mixed-attribute fusion-based NBC (MAF-NBC). It alleviates the two aforementioned limitations by relying on a mixed-attribute fusion mechanism with an improved autoencoder neural network for NBC construction. MAF-NBC transforms the original mixed attributes of a data set into a series of encoded attributes with maximum independence as a pre-processing step. To guarantee the generation of useful encoded attributes, an efficient objective function is designed to optimize the weights of the autoencoder neural network by considering both the encoding error and the attribute’s dependence. A series of persuasive experiments was conducted to validate the feasibility, rationality, and effectiveness of the designed MAF-NBC approach. Results demonstrate that MAF-NBC has superior classification performance than eight state-of-the-art Bayesian algorithms, namely the discretization-based NBC (Dis-NBC), flexible naive Bayes (FNB), tree-augmented naive (TAN) Bayes, averaged one-dependent estimator (AODE), hidden naive Bayes (HNB), deep feature weighting for NBC (DFW-NBC), correlation-based feature weighting filter for NBC (CFW-NBC), and independent component analysis-based NBC (ICA-NBC).
Read full abstract