Abstract

In this work, we have considered the ensemble of classifier chains (ECC) algorithm in order to solve the multi-label classification (MLC) task. It starts from binary relevance algorithm (BR), a simple and direct approach to MLC that has been shown to provide good results in practice. Nevertheless, unlike BR, ECC aims to exploit the correlations between labels. ECC uses an algorithm of traditional supervised classification in order to approach the binary problems. Within this field, Credal C4.5 (CC4.5) is a new version of the well-known C4.5 algorithm that uses imprecise probabilities in order to estimate the probability distribution of the class variable. This new version of C4.5 algorithm has been shown to provide better performance when noisy datasets are classified. In MLC, the intrinsic noise might be higher than in traditional supervised classification. The reason is very simple: in MLC, there are multiple labels, whereas in traditional classification there is just a class variable. Thus, there is more probability of error for an instance. For the previous reasons, the performance of ECC with CC4.5 as base classifier is studied in this work. We have carried out an extensive experimental analysis with several multi-label datasets, different noise levels and a large number of evaluation metrics for MLC. This experimental study has shown that, generally, ECC has better performance with CC4.5 as base classifier than using C4.5. The higher is the label noise level introduced in the data, the more significative is this improvement. Therefore, it is probably suitable to use imprecise probabilities in Decision Trees within MLC.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call