A flexible class of dependence-aware multi-label loss functions

Eyke Hüllermeier,Michael Rapp,Marcel Wever,Eneldo Loza Mencia,Johannes Fürnkranz

doi:10.1007/s10994-021-06107-2

Abstract

The idea to exploit label dependencies for better prediction is at the core of methods for multi-label classification (MLC), and performance improvements are normally explained in this way. Surprisingly, however, there is no established methodology that allows to analyze the dependence-awareness of MLC algorithms. With that goal in mind, we introduce a class of loss functions that are able to capture the important aspect of label dependence. To this end, we leverage the mathematical framework of non-additive measures and integrals. Roughly speaking, a non-additive measure allows for modeling the importance of correct predictions of label subsets (instead of single labels), and thereby their impact on the overall evaluation, in a flexible way. The well-known Hamming and subset 0/1 losses are rather extreme special cases of this function class, which give full importance to single label sets or the entire label set, respectively. We present concrete instantiations of this class, which appear to be especially appealing from a modeling perspective. The assessment of multi-label classifiers in terms of these losses is illustrated in an empirical study, clearly showing their aptness at capturing label dependencies. Finally, while not being the main goal of this study, we also show some preliminary results on the minimization of this parametrized family of losses.

Highlights

The setting of multi-label classification (MLC), which generalizes standard multi-class classification by relaxing the assumption of mutual exclusiveness of classes, has received a lot of attention in the recent machine learning literature—we refer to Tsoumakas et al (2010) and Zhang and Zhou (2014) for survey articles on this topic
The idea of exploiting statistical dependencies between the labels in order to improve predictive performance on the level of the entire label set is a major theme in research on multi-label classification, and many MLC methods proposed in the literature are motivated by this idea
We introduce a new class of MLC loss functions that are able to capture the important aspect of label dependence in a sophisticated and controllable manner

Summary

Introduction

The setting of multi-label classification (MLC), which generalizes standard multi-class classification by relaxing the assumption of mutual exclusiveness of classes, has received a lot of attention in the recent machine learning literature—we refer to Tsoumakas et al (2010) and Zhang and Zhou (2014) for survey articles on this topic. A straightforward approach for learning such a predictor is via a reduction to binary classification, i.e., by training one binary classifier per label and combining the predictions of these classifiers into an overall multi-label prediction. This approach, known as binary relevance (BR) learning, is often criticized for ignoring possible label dependencies, because each label is predicted independently of all other labels. The idea of exploiting statistical dependencies between the labels in order to improve predictive performance on the level of the entire label set is a major theme in research on multi-label classification, and many MLC methods proposed in the literature are motivated by this idea

Methods

Results

Conclusion