A New Instance-weighting Naive Bayes Text Classifiers

Yongcheng Wu

doi:10.1109/irce.2018.8492960

Abstract

It is shown in recent research that naive Bayes text classifiers have achieved noticeable classification performance despite its strong assumption of conditional independence among features. In order to weaken this unrealistic assumption and improve the classification accuracy, there are generally three methods: structures manipulating, features manipulating, and instances manipulating. Instances manipulating can be further divided into instance-weighting and instance-selecting. In this paper, we propose a new instance-weighting approach to naive Bayes text classifier. In this new approach, the training dataset is firstly divided into several subsets according to their class value. Then every training instance in a subset is weighted according to the distance between it and the mean of the training subset. The experimental results on 15 text document datasets show that in terms of the accuracy of classification, our method performs better than three existing naive Bayes text classifiers.

Full Text