Accuracy and Diversity in Ensembles of Text Categorisers

Juan Jose Garcıa Adeva,Ulises Cervino Beresi,Rafael A Calvo

doi:10.19153/cleiej.8.2.1

Abstract

   Error-Correcting Out Codes (ECOC) ensembles of binary classifiers are used in Text Cate- gorisation to improve the accuracy while benefiting from learning algorithms that only support two classes. An accurate ensemble relies on the quality of its corresponding decomposition ma- trix, which at the same time depends on the separation between the categories and the diversity of the dichotomies representing the binary classifiers. Important open questions include finding a good definition for diversity between two dichotomies and a way of combining all the pairwise diversity values into a single indicator that we call the decomposition quality. In this work we introduce a new measure to estimate the diversity between two learners and we compare it to the well-known Hamming distance. We also examine three functions to evaluate the decomposition quality. We present a set of experiments where these measures and functions are tested using two distinct document corpora with several configurations in each. The analysis of the results shows a weak relationship between the ensemble accuracy and its diversity.  

Highlights

Monolithic multi-category machine learning algorithms have successfully been used in a number of application areas such as text classification
Our current work focuses on the Error Correcting Output Codes (ECOC) ensemble decomposition method [7], where the general multi-category classification problem or polychotomy is decomposed into a set of dichotomies, each one of them targeted at a particular subset of categories, with each dichotomy processed by a binary classifier
Each one of these two tables contains eight ranges of measurement obtained in the experiments for each ensemble of binary classifiers

Summary

Introduction

Monolithic multi-category machine learning algorithms have successfully been used in a number of application areas such as text classification. Our current work focuses on the Error Correcting Output Codes (ECOC) ensemble decomposition method [7], where the general multi-category classification problem or polychotomy is decomposed into a set of dichotomies, each one of them targeted at a particular subset of categories, with each dichotomy processed by a binary classifier These category subsets are chosen in a way that a certain amount of prediction errors can be recovered, offering an error-correcting ability that helps improve accuracy. We contribute here a definition of diversity and an experimental evaluation of its use as a parameter to improve the ensemble’s classification accuracy We want this global diversity measure to be independent of the separation of categories, as opposed to some related work [12].

ECOC Ensembles

Diversity Measures

Approach

Experimental Configuration

Experimental Results

Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: CLEI Electronic Journal	Publication Date: Dec 1, 2005
Citations: 31	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Accuracy and Diversity in Ensembles of Text Categorisers

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: CLEI Electronic Journal

Lead the way for us

Similar Papers

Evaluation of Diversity Measures for Binary Classifier Ensembles
Anand Narasimhamurthy
-
Anand NarasimhamurthyAnand Narasimhamurthy
01 Jan 2004
01 Jan 2004

A comparative study of recurrent neural network models for lexical domain classification
Suman Ravuri ... Andreas Stolcke
-
Suman Ravuri, et. al.Suman Ravuri ... Andreas Stolcke
01 Mar 2016
01 Mar 2016

Forests of nested dichotomies
Juan J Rodríguez ... Jesús Maudes
Pattern recognition letters | VOL. 31
Juan J Rodríguez, et. al.Juan J Rodríguez ... Jesús Maudes
19 Sep 2009
Pattern recognition letters | VOL. 31

Boosting Ensemble Accuracy by Revisiting Ensemble Diversity Metrics
Yanzhao Wu ... Ling Liu
-
Yanzhao Wu, et. al.Yanzhao Wu ... Ling Liu
01 Jun 2021
01 Jun 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Accuracy and Diversity in Ensembles of Text Categorisers

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: CLEI Electronic Journal