Abstract
Linear discriminant analysis (LDA) is a classical statistical machine-learning method, which aims to find a linear data transformation increasing class discrimination in an optimal discriminant subspace. Traditional LDA sets assumptions related to the Gaussian class distributions and single-label data annotations. In this article, we propose a new variant of LDA to be used in multilabel classification tasks for dimensionality reduction on original data to enhance the subsequent performance of any multilabel classifier. A probabilistic class saliency estimation approach is introduced for computing saliency-based weights for all instances. We use the weights to redefine the between-class and within-class scatter matrices needed for calculating the projection matrix. We formulate six different variants of the proposed saliency-based multilabel LDA (SMLDA) based on different prior information on the importance of each instance for their class(es) extracted from labels and features. Our experiments show that the proposed SMLDA leads to performance improvements in various multilabel classification problems compared to several competing dimensionality reduction methods.
Highlights
M ULTILABEL classification tasks have become more and more common in the machine-learning field recently, for example, in text information categorization [1], image and video annotation [2], sequential data prediction [3], or music information retrieval [4]
We propose a novel dimensionality reduction method for multilabel classification based on a probabilistic approach that is able to estimate the contribution of each data item to the classes it is associated with by taking into account prior information encoded using various types of metrics
3) We integrate different label and feature information previously used as weights in dimensionality reduction to saliency-based multilabel LDA (SMLDA) by using them as prior information for probabilistic saliency estimation and show experimentally that our approach leads to a better performance
Summary
M ULTILABEL classification tasks have become more and more common in the machine-learning field recently, for example, in text information categorization [1], image and video annotation [2], sequential data prediction [3], or music information retrieval [4]. A well-known supervised dimensionality reduction technique linear discriminant analysis (LDA) and its variants have been widely used to extract discriminant data representations for solving various problems, for example, in human action recognition [10] or biological data classification [11] They are not optimal for multilabel problems due to the characteristics of multilabel data. We propose a novel dimensionality reduction method for multilabel classification based on a probabilistic approach that is able to estimate the contribution of each data item to the classes it is associated with by taking into account prior information encoded using various types of metrics. 3) We integrate different label and feature information previously used as weights in dimensionality reduction to SMLDA by using them as prior information for probabilistic saliency estimation and show experimentally that our approach leads to a better performance.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.