Abstract


 
 
 Error-Correcting Out Codes (ECOC) ensembles of binary classifiers are used in Text Cate- gorisation to improve the accuracy while benefiting from learning algorithms that only support two classes. An accurate ensemble relies on the quality of its corresponding decomposition ma- trix, which at the same time depends on the separation between the categories and the diversity of the dichotomies representing the binary classifiers. Important open questions include finding a good definition for diversity between two dichotomies and a way of combining all the pairwise diversity values into a single indicator that we call the decomposition quality. In this work we introduce a new measure to estimate the diversity between two learners and we compare it to the well-known Hamming distance. We also examine three functions to evaluate the decomposition quality. We present a set of experiments where these measures and functions are tested using two distinct document corpora with several configurations in each. The analysis of the results shows a weak relationship between the ensemble accuracy and its diversity.
 
 

Highlights

  • Monolithic multi-category machine learning algorithms have successfully been used in a number of application areas such as text classification

  • Our current work focuses on the Error Correcting Output Codes (ECOC) ensemble decomposition method [7], where the general multi-category classification problem or polychotomy is decomposed into a set of dichotomies, each one of them targeted at a particular subset of categories, with each dichotomy processed by a binary classifier

  • Each one of these two tables contains eight ranges of measurement obtained in the experiments for each ensemble of binary classifiers

Read more

Summary

Introduction

Monolithic multi-category machine learning algorithms have successfully been used in a number of application areas such as text classification. Our current work focuses on the Error Correcting Output Codes (ECOC) ensemble decomposition method [7], where the general multi-category classification problem or polychotomy is decomposed into a set of dichotomies, each one of them targeted at a particular subset of categories, with each dichotomy processed by a binary classifier These category subsets are chosen in a way that a certain amount of prediction errors can be recovered, offering an error-correcting ability that helps improve accuracy. We contribute here a definition of diversity and an experimental evaluation of its use as a parameter to improve the ensemble’s classification accuracy We want this global diversity measure to be independent of the separation of categories, as opposed to some related work [12].

ECOC Ensembles
Diversity Measures
Approach
Experimental Configuration
Experimental Results
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call