Abstract
Multi-label classification aims to associate multiple labels to a given data/object instance to better describe them. Multi-label data sets are common in a lot of emerging application areas like: Text/Multimedia classification, Bio-Informatics, Medical image annotations and Computer Vision to name a few. There is a growing interest in efficient and accurate multi-label classification. There are two major approaches to perform multi-label classification (i) problem transformation methods and (ii) algorithm adaptation methods. In algorithm adaptation, the traditional classification algorithms are modified to handle multi-label data sets. One classification algorithm which is often modified to do multi-label classification is k- nearest neighbor (kNN). k-nearest neighbor is popular due to its simplicity, easy to implement and seamlessly adaptability. Despite its merits it has several drawbacks like: sensitivity of noisy data, missing values and outliers; feature scaling and often becoming inaccurate for large overlapping solution space. In this paper, a modification to kNN method is suggested for multi-label classification with three improvement strategies (i) selection of local example w.r.t. unknown example – the motivation for this comes from the fact that local and relevant space is vital for the improvement in multi-label classification; (ii) Splitting the input space into multiple sub-spaces for optimal label estimation – the motivation is to estimate label accurately in the presence of noisy labels; And (iii) selection of labels using Mean Average Precision (MAP) estimates – here our motivation is to utilize the training data effectively to maximize the hidden distribution and optimal parameters for the method. The proposed method is implemented and compared with state-of-the-art approaches based on kNN or similar approaches that effectively select and optimize relevant spaces for multi-label classification. Evaluation based on multiple metrics like Hamming loss, Precision/Recall and F-measure are used for evaluation. The suggested approach performed much better than the state-of-the-art on the datasets with strong label cardinalities.
Highlights
These figures show the comparison between the techniques that use k-nearest neighbors and other stateof-the-art techniques commonly discussed in multi-label literature.(Some of these techniques relate to our approach as they are based on instance selection techniques (HDLSSm and HDLSSo) and others for taking local information into account for label classification (IBLR))
It can be concluded that DASMLKNN outperformed all the methods included in our study for the dataset that belongs to biological domain that has functional classes of genes (Yeast) when Hamming loss is compared
The proposed method has used mutuality strategy to grab the adaptive neighbors in the field and take local information of the instances into account by splitting the input space into multiple smaller and more relevant spaces
Summary
S EVERAL researches have been conducted in ML for binary or multi-class classification in which a single label from a set of labels/classes L is assigned to each instance in the dataset. Multi-label classification is an emerging field that refers to the set of instances which have more than one label assigned. It is very popular in categorization of text, classification of multimedia, bio-informatics, classification of protein functions and semantic scenes. Let X be an space, and L = {λ1, λ2, ....., λk} be a set of finite labels. An example x ∈ X , represented in terms of features vector x = (x1, x2, ....., xm), is assigned with a subset of labels L ∈ 2L.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.