Auto-adaptive Grammar-Guided Genetic Programming algorithm to build Ensembles of Multi-Label Classifiers

Jose M Moyano,Sebastián Ventura

doi:10.1016/j.inffus.2021.07.005

Abstract

Multi-label classification has been used to solve a wide range of problems where each example in the dataset may be related either to one class (as in traditional classification problems) or to several class labels at the same time. Many ensemble-based approaches have been proposed in the literature, aiming to improve the performance of traditional multi-label classification algorithms. However, most of them do not consider the data characteristics to build the ensemble, and those that consider them need to tune many parameters to maximize their performance.In this paper, we propose an Auto-adaptive algorithm based on Grammar-Guided Genetic Programming to generate Ensembles of Multi-Label Classifiers based on projections of k labels (AG3P-kEMLC). It creates a tree-shaped ensemble, where each leaf is a multi-label classifier focused on a subset of k labels. Unlike other methods in the literature, our proposal can deal with different values of k in the same ensemble, instead of fixing one specific value. It also includes an auto-adaptive process to reduce the number of hyper-parameters to tune, prevent overfitting and reduce the runtime required to execute it. Three versions of the algorithm are proposed. The first, fixed, uses the same value of k for all multi-label classifiers in the ensemble. The remaining two deal with different k values in the ensemble: uniform gives the same probability to choose each available value of k, and gaussian favors the selection of smaller values of k.The experimental study carried out considering twenty reference datasets and five evaluation metrics, compared with eleven ensemble methods demonstrates that our proposal performs significantly better than the state-of-the-art methods.

Highlights

Classification is a machine learning task which aims to build a model able to predict one of the predefined categorical classes for a given input instance [1]
The experimental study carried out over 20 datasets and using 5 evaluation metrics demonstrates that our proposal obtains significantly better performance than state-of-the-art Ensembles of Multi-Label Classifiers (EMLCs)
The performance of AG3P-kEMLC is analyzed, and it is compared with the state-of-the-art EMLCs

Summary

Introduction

Classification is a machine learning task which aims to build a model able to predict one of the predefined categorical classes for a given input instance [1]. A wide range of real-world problems do not fit the restrictions of traditional classification, where each instance is associated to only one class. Examples of such problems are medical diagnosis (where each patient may have more than one disease) [2,3], image annotation (an image could be labeled with more than one item appearing in it) [4,5] and emotions detection (a person could be feeling more than one emotion at the same time) [6,7]. The output labels tend to be correlated among themselves, some of them appearing more frequently together than with others. Several studies have demonstrated that by tackling these problems or challenges, the performance of the multi-label methods is improved [10,11,12,13]

Objectives

Results

Discussion

Conclusion