Abstract

Text classification is one domain in which the naive Bayesian (NB) learning algorithm performs remarkably well. However, making further improvement in performance using ensemble-building techniques proved to be a challenge because NB is a stable algorithm. This work shows that, while an ensemble of NB classifiers achieves little or no improvement in terms of classification accuracy, an ensemble of fine-tuned NB classifiers can achieve a remarkable improvement in accuracy. We propose a fine-tuning algorithm for text classification that is both more accurate and less stable than the NB algorithm and the fine-tuning NB (FTNB) algorithm. This improvement makes it more suitable than the FTNB algorithm for building ensembles of classifiers using bagging. Our empirical experiments, using 16-benchmark text-classification data sets, show significant improvement for most data sets.

Highlights

  • In text classification, the task is to assign a document to a category of a predefined set of categories

  • This section is divided into two subsections: In the first, we review the related work on ensembles of classifiers in general, and building ensembles of naive Bayesian (NB) classifiers in particular; in the second, we review the FTNB algorithm [14] for fine-tuning NB classifiers

  • Our results showed that the Gradual FTNB (GFTNB) algorithm outperformed the FTNB algorithm in terms of the average classification accuracy for the 16 text-classification data sets, and in terms of the number of data sets for which it achieved better and significantly better average accuracy

Read more

Summary

Introduction

The task is to assign a document to a category of a predefined set of categories. Bagging [5] and boosting [6,7] are probably the most widely used methods for building ensembles of classifiers They train the constituent classifiers using different samples of the training data. Making further improvement by building an ensemble of several NB classifiers is a challenge because NB is a stable algorithm [12], in the sense that a small change in the training data does not lead to a substantially different classifier. We use the fine-tuning method to generate a diverse ensemble of NB classifiers for text classification.

Related Work
Building Ensembles of Classifiers
Fine-Tuning the NB Algorithm
Bagging NB and the Fine-Tuning Algorithms
Bagging the NB and FTNB Algorithms for Text Classification
Building
Comparing
Modifying the Termination Condition
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call