Abstract

We propose using five data-driven community detection approaches from social networks to partition the label space in the task of multi-label classification as an alternative to random partitioning into equal subsets as performed by RAkELd. We evaluate modularity-maximizing using fast greedy and leading eigenvector approximations, infomap, walktrap and label propagation algorithms. For this purpose, we propose to construct a label co-occurrence graph (both weighted and unweighted versions) based on training data and perform community detection to partition the label set. Then, each partition constitutes a label space for separate multi-label classification sub-problems. As a result, we obtain an ensemble of multi-label classifiers that jointly covers the whole label space. Based on the binary relevance and label powerset classification methods, we compare community detection methods to label space divisions against random baselines on 12 benchmark datasets over five evaluation measures. We discover that data-driven approaches are more efficient and more likely to outperform RAkELd than binary relevance or label powerset is, in every evaluated measure. For all measures, apart from Hamming loss, data-driven approaches are significantly better than RAkELd ( α = 0 . 05 ), and at least one data-driven approach is more likely to outperform RAkELd than a priori methods in the case of RAkELd’s best performance. This is the largest RAkELd evaluation published to date with 250 samplings per value for 10 values of RAkELd parameter k on 12 datasets published to date.

Highlights

  • We check how each method performs in the worst case, i.e., what is the minimum probability of it being better than randomness in label space division?

  • RAkELd served as the random baseline for which we have drawn at most 250 distinct label space partitions for at most ten different values of the parameter k of label subset sizes

  • Out of the seven methods, five inferred the label space partitioning from training data in the datasets, while the two others were based on an a priori assumption on how to divide the label space

Read more

Summary

Introduction

Shannon’s work on the unpredictability of information content inspired a search for the area of multi-label classification that requires more insight: where has the field still been using random approaches to handling data uncertainty when non-random methods could shed light and provide the ability to make better predictions?Interestingly enough, random methods are prevalent in well-cited and multi-label classification approaches, especially in the problem of label space partitioning, which is a core issue in the problem-transformation approach to multi-label classification.A great family of multi-label classification methods, called problem transformation approaches, depends on converting an instance of a multi-label classification problem into one or more single-labelEntropy 2016, 18, 282; doi:10.3390/e18080282 www.mdpi.com/journal/entropyEntropy 2016, 18, 282 single-class or multi-class classification problems, performs such classification and converts the results back to multi-label classification results.Such a situation stems from the fact that historically, the field of classification started out with solving single-label classification problems; in general, a classification problem of understanding the relationship (function) between a set of objects and a set of categories that should be assigned to it. A great family of multi-label classification methods, called problem transformation approaches, depends on converting an instance of a multi-label classification problem into one or more single-label. Entropy 2016, 18, 282 single-class or multi-class classification problems, performs such classification and converts the results back to multi-label classification results. Such a situation stems from the fact that historically, the field of classification started out with solving single-label classification problems; in general, a classification problem of understanding the relationship (function) between a set of objects and a set of categories that should be assigned to it. In the single-label scenario, in one variant, we deal with a case when there is only one category, i.e., the problem is a binary choice: whether to assign a category or not, such a scenario is called single-class classification, e.g., the case of classifying whether there is a car in the picture or not

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.