Abstract


 
 
 Traditional classification algorithms consider learning problems that contain only one label, i.e., each example is associated with one single nominal target variable characterizing its property. However, the number of practical applications involving data with multiple target variables has increased. To learn from this sort of data, multi-label classification algorithms should be used. The task of learning from multi-label data can be addressed by methods that transform the multi-label classification problem into several single-label classification problems. In this work, two well known methods based on this approach are used, as well as a third method we propose to overcome some deficiencies of one of them, in a case study using textual data related to medical findings, which were structured using the bag-of-words approach. The experimental study using these three methods shows an improvement on the results obtained by our proposed multi-label classification method.
 
 

Highlights

  • Traditional single-label classification methods are concerned with learning from a set of examples that are associated with a single label y from a set of disjoint labels L, |L| > 1 [9, 1]

  • Label Power Set (LP) takes into account label dependency, when a large or even moderate number of labels are considered, the task of multi-class learning the label power sets would become rather challenging due to the tremendous number of possible label sets

  • Binary Relevance (BR)+ was implemented using Mulan3, a package of Java classes for multi-label classification based on Weka4, a collection of machine learning algorithms for data mining tasks implemented in Java

Read more

Summary

Introduction

Traditional single-label classification methods are concerned with learning from a set of examples that are associated with a single label y from a set of disjoint labels L, |L| > 1 [9, 1]. The multi-label problem can be transformed into one multi-class single-label learning problem, using as target values for the class attribute all unique existing subsets of multi-labels present in the training instances (the distinct subsets of labels) This method is called Label Power Set (LP ). Among labels are mapped directly from the data, since all the existing combinations of single-labels present in the training instances are used as a possible label in the correspondent multi-class single-label classification problem. In this context, the Binary Relevance method has been strongly criticized due to its incapacity of handling label dependency information [10].

Multi-label Classification
Label Power Set
Binary Relevance
Evaluation Measures
Experimental Set Up
Case Study
Algorithms
Experimental Results
Conclusions and Future Work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call