Abstract

Due to its simplicity, efficiency, and effectiveness, multinomial naive Bayes (MNB) has been widely used for text classification. As in naive Bayes (NB), its assumption of the conditional independence of features is often violated and, therefore, reduces its classification performance. Of the numerous approaches to alleviating its assumption of the conditional independence of features, structure extension has attracted less attention from researchers. To the best of our knowledge, only structure-extended MNB (SEMNB) has been proposed so far. SEMNB averages all weighted super-parent one-dependence multinomial estimators; therefore, it is an ensemble learning model. In this paper, we propose a single model called hidden MNB (HMNB) by adapting the well-known hidden NB (HNB). HMNB creates a hidden parent for each feature, which synthesizes all the other qualified features’ influences. For HMNB to learn, we propose a simple but effective learning algorithm without incurring a high-computational-complexity structure-learning process. Our improved idea can also be used to improve complement NB (CNB) and the one-versus-all-but-one model (OVA), and the resulting models are simply denoted as HCNB and HOVA, respectively. The extensive experiments on eleven benchmark text classification datasets validate the effectiveness of HMNB, HCNB, and HOVA.

Highlights

  • Due to its simplicity, efficiency, and effectiveness, naive Bayes (NB) has been widely used to analyze and solve many scientific and engineering problems, such as text classification [1,2], resistance of buildings [3], identification of areas susceptible to flooding [4], and urban flooding prediction [5]

  • hidden MNB (HMNB) creates a hidden parent for each feature, which synthesizes all the other qualified features’ influences

  • Our improved idea can be used to improve complement NB (CNB) and OVA, and the resulting models are denoted as HCNB and HOVA, respectively

Read more

Summary

Introduction

Efficiency, and effectiveness, naive Bayes (NB) has been widely used to analyze and solve many scientific and engineering problems, such as text classification [1,2], resistance of buildings [3], identification of areas susceptible to flooding [4], and urban flooding prediction [5]. To capture the frequency information of each occurring word, multinomial naive Bayes (MNB) [8] was proposed. As in naive Bayes (NB), the assumption of the attributes’ (i.e., features) conditional independence that they need is usually violated and, reduces their classification accuracy. To alleviate their assumption of features’ conditional independence, many approaches have been proposed. We propose a single model called hidden multinomial naive Bayes (HMNB). We found that structure extension has attracted much less attention from researchers, and only SEMNB was proposed so far

Feature Weighting
Feature Selection
Instance Weighting
Instance Selection
Structure Extension
The Proposed Models
Experiments and Results
Conclusions and Future Study
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call