Using Diversity for Classifier Ensemble Pruning: An Empirical Investigation

Muhammad Atta Othman Ahmed,Bahram Lavi,Luca Didaci,Giorgio Fumera

doi:10.20904/291-2025

Abstract

The concept of `diversity' has been one of the main open issues in the field of multiple classifier systems. In this paper we address a facet of diversity related to its effectiveness for ensemble construction, namely, explicitly using diversity measures for ensemble construction techniques based on the kind of overproduce and choose strategy known as ensemble pruning. Such a strategy consists of selecting the (hopefully) more accurate subset of classifiers out of an original, larger ensemble. Whereas several existing pruning methods use some combination of individual classifiers' accuracy and diversity, it is still unclear whether such an evaluation function is better than the bare estimate of ensemble accuracy. We empirically investigate this issue by comparing two evaluation functions in the context of ensemble pruning: the estimate of ensemble accuracy, and its linear combination with several well-known diversity measures. This can also be viewed as using diversity as a regularizer, as suggested by some authors. To this aim we use a pruning method based on forward selection, since it allows a direct comparison between different evaluation functions. Experiments on thirty-seven benchmark data sets, four diversity measures and three base classifiers provide evidence that using diversity measures for ensemble pruning can be advantageous over using only ensemble accuracy, and that diversity measures can act as regularizers in this context.

Highlights

During twenty years of research in the classifier ensemble field, understanding the notion of diversity has been one of the main goals [1, 2]
Our results show that using diversity measures for ensemble pruning can be advantageous over using only ensemble accuracy, and that diversity measures can act as regularizers in this context
Whereas the usefulness of diversity measures for ensemble construction has been questioned by some authors, their specific role as regularizers has been recently pointed out in [14] based on theoretical results as well as on empirical evidence in the context of ensemble pruning, in a specific setting

Summary

Introduction

During twenty years of research in the classifier ensemble field, understanding the notion of diversity has been one of the main goals [1, 2]. The measure derived in [8] (which we extended in [6]) was inspired by the ambiguity decomposition of [9], and provided useful insights, leading to the concept of ‘good’ and ‘bad’ patterns of diversity Such measures were motivated by the goal of obtaining exact, additive decompositions of the ensemble error into terms accounting for individual classifiers’ performance, and terms hopefully interpretable as diversity. Empirically or analytically, the connection between ensemble performance on one side, and the pattern of individual classifiers’ performance and existing diversity measures on the other side (e.g., [4, 10]) Such a relationship turned out to be far from clear-cut, and no ‘right’ diversity measure has emerged so far

Objectives

Results

Conclusion