Combining binary classifiers in different dichotomy spaces for text categorization

Roberto H.W Pinheiro,George D.C Cavalcanti,Ing Ren Tsang

doi:10.1016/j.asoc.2018.12.023

Abstract

Several supervised machine learning applications are commonly represented as multi-class problems, but it is harder to distinguish several classes rather than just two classes. In contrast to the approaches one-against-all and all-pairs that transform a multi-class problem into a set of binary problems, Dichotomy Transformation (DT) converts a multi-class problem into a different problem where the goal is to verify if a pair of documents belongs to the same class or not. To perform this task, DT generates a dichotomy set obtained by combining a pair of documents, each belongs to either a positive class (documents in the pair that have the same class) or a negative class (documents in the pair that come from different classes). The definition of this dichotomy set plays an important role in the overall accuracy of the system. So, an alternative to avoid searching for the best dichotomy set is using multiple classifier systems because we can have many different sets where each one is used to train one binary classifier instead of having only one dichotomy set. Herein we propose Combined Dichotomy Transformations (CoDiT), a Text Categorization system that combines binary classifiers that are trained with different dichotomy sets using DT. By using DT, the number of training examples increases exponentially when compared with the original training set. This is a desirable property because each classifier can be trained with different data without reducing the number of examples or features. Therefore, it is possible to compose an ensemble with diverse and strong classifiers. Experiments using 14 databases show that CoDiT achieves statistically better results in comparison to SVM, Bagging, Random Subspace, BoosTexter, and Random Forest.

Full Text