A Dependent Multilabel Classification Method Derived from the k-Nearest Neighbor Rule

Zoulficar Younes (Eurasip Member),Hichem Snoussi,Thierry Denoeux,Fahed Abdallah

doi:10.1155/2011/645964

Abstract

In multilabel classification, each instance in the training set is associated with a set of labels, and the task is to output a label set whose size is unknown a priori for each unseen instance. The most commonly used approach for multilabel classification is where a binary classifier is learned independently for each possible class. However, multilabeled data generally exhibit relationships between labels, and this approach fails to take such relationships into account. In this paper, we describe an original method for multilabel classification problems derived from a Bayesian version of the k-nearest neighbor (k-NN) rule. The method developed here is an improvement on an existing method for multilabel classification, namely multilabel k-NN, which takes into account the dependencies between labels. Experiments on simulated and benchmark datasets show the usefulness and the efficiency of the proposed approach as compared to other existing methods.

Highlights

Traditional single-label classification assigns an object to exactly one class, from a set of Q disjoint classes
Each document may belong to multiple topics, such as arts and humanities [2,3,4,5]; in gene functional analysis, each gene may be associated with a set of functional classes, such as energy, metabolism, and cellular biogenesis [6]; in natural scene classification, each image may belong to several image types at the same time, such as sea and sunset [1]
The model parameters for DMLkNN are the number of neighbors k, the fuzziness parameter δ, and the smoothing parameter s

Summary

Introduction

Traditional single-label classification assigns an object to exactly one class, from a set of Q disjoint classes. In [10], the authors present a Bayesian multilabel knearest neighbor (MLkNN) approach where, in order to assign a set of labels to a new instance, a decision is made separately for each label by taking into account the number of neighbors containing the label to be assigned. This method fails to take into account the dependency between labels. We present a generalization of the MLkNNbased approach to multilabel classification problems where the dependencies between classes are considered We call this method DMLkNN, for dependent multilabel k-nearest Neighbor.

Related Work

Multilabel Classification

The DMLkNN Method for Multilabel Classification

Experiments

Prediction-Based Metrics

Conclusion