How Is a Data-Driven Approach Better than Random Choice in Label Space Division for Multi-Label Classification?

Piotr Szymański,Tomasz Kajdanowicz,Kristian Kersting

doi:10.3390/e18080282

Piotr Szymański, Tomasz Kajdanowicz + Show 1 more

Open Access

https://doi.org/10.3390/e18080282

Copy DOI

Abstract

We propose using five data-driven community detection approaches from social networks to partition the label space in the task of multi-label classification as an alternative to random partitioning into equal subsets as performed by RAkELd. We evaluate modularity-maximizing using fast greedy and leading eigenvector approximations, infomap, walktrap and label propagation algorithms. For this purpose, we propose to construct a label co-occurrence graph (both weighted and unweighted versions) based on training data and perform community detection to partition the label set. Then, each partition constitutes a label space for separate multi-label classification sub-problems. As a result, we obtain an ensemble of multi-label classifiers that jointly covers the whole label space. Based on the binary relevance and label powerset classification methods, we compare community detection methods to label space divisions against random baselines on 12 benchmark datasets over five evaluation measures. We discover that data-driven approaches are more efficient and more likely to outperform RAkELd than binary relevance or label powerset is, in every evaluated measure. For all measures, apart from Hamming loss, data-driven approaches are significantly better than RAkELd ( α = 0 . 05 ), and at least one data-driven approach is more likely to outperform RAkELd than a priori methods in the case of RAkELd’s best performance. This is the largest RAkELd evaluation published to date with 250 samplings per value for 10 values of RAkELd parameter k on 12 datasets published to date.

Highlights

We check how each method performs in the worst case, i.e., what is the minimum probability of it being better than randomness in label space division?
RAkELd served as the random baseline for which we have drawn at most 250 distinct label space partitions for at most ten different values of the parameter k of label subset sizes
Out of the seven methods, five inferred the label space partitioning from training data in the datasets, while the two others were based on an a priori assumption on how to divide the label space

Summary

Introduction

Shannon’s work on the unpredictability of information content inspired a search for the area of multi-label classification that requires more insight: where has the field still been using random approaches to handling data uncertainty when non-random methods could shed light and provide the ability to make better predictions?Interestingly enough, random methods are prevalent in well-cited and multi-label classification approaches, especially in the problem of label space partitioning, which is a core issue in the problem-transformation approach to multi-label classification.A great family of multi-label classification methods, called problem transformation approaches, depends on converting an instance of a multi-label classification problem into one or more single-labelEntropy 2016, 18, 282; doi:10.3390/e18080282 www.mdpi.com/journal/entropyEntropy 2016, 18, 282 single-class or multi-class classification problems, performs such classification and converts the results back to multi-label classification results.Such a situation stems from the fact that historically, the field of classification started out with solving single-label classification problems; in general, a classification problem of understanding the relationship (function) between a set of objects and a set of categories that should be assigned to it. A great family of multi-label classification methods, called problem transformation approaches, depends on converting an instance of a multi-label classification problem into one or more single-label. Entropy 2016, 18, 282 single-class or multi-class classification problems, performs such classification and converts the results back to multi-label classification results. Such a situation stems from the fact that historically, the field of classification started out with solving single-label classification problems; in general, a classification problem of understanding the relationship (function) between a set of objects and a set of categories that should be assigned to it. In the single-label scenario, in one variant, we deal with a case when there is only one category, i.e., the problem is a binary choice: whether to assign a category or not, such a scenario is called single-class classification, e.g., the case of classifying whether there is a car in the picture or not

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Entropy	Publication Date: Jul 30, 2016
Citations: 75	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

How Is a Data-Driven Approach Better than Random Choice in Label Space Division for Multi-Label Classification?

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy

Lead the way for us

Similar Papers

Multi-label classification approach for quranic verses labeling
Abdullahi Adeleke ... Shamsul Kamal Ahmad Khalid
Indonesian Journal of Electrical Engineering and Computer Science | VOL. 24
Abdullahi Adeleke, et. al.Abdullahi Adeleke ... Shamsul Kamal Ahmad Khalid
01 Oct 2021
Indonesian Journal of Electrical Engineering and Computer Science | VOL. 24

Harnessing Multi-label Classification Approaches for Economic Phenomena Categorization
Nofriani ... Novianto Budi Kurniawan
ASEAN Journal on Science and Technology for Development | VOL. 38
Nofriani, et. al. Nofriani ... Novianto Budi Kurniawan
31 Aug 2021
ASEAN Journal on Science and Technology for Development | VOL. 38

Bayesian Chain Classifier with Feature Selection for Multi-label Classification
Ricardo Benítez Jiménez ... Eduardo F Morales
-
Ricardo Benítez Jiménez, et. al.Ricardo Benítez Jiménez ... Eduardo F Morales
01 Jan 2018
01 Jan 2018

Business text classification with imbalanced data and moderately large label spaces for digital transformation
Muhammad Arslan ... Christophe Cruz
Applied Network Science | VOL. 9
Muhammad Arslan, et. al.Muhammad Arslan ... Christophe Cruz
30 Apr 2024
Applied Network Science | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

How Is a Data-Driven Approach Better than Random Choice in Label Space Division for Multi-Label Classification?

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy