Abstract
High-dimensional multi-labeled data contain instances, where each instance is associated with a set of class labels and has a large number of noisy and irrelevant features. Feature selection has been shown to have great benefits in improving the classification performance in machine learning. In multi-label learning, to select the discriminative features among multiple labels, several challenges should be considered: interdependent labels, different instances may share different label correlations, correlated features, and missing and flawed labels. This work is part of a project at The Children's Hospital at Westmead (TB-CHW), Australia to explore the genomics of childhood leukaemia. In this paper, we propose a CMFS (Correlated- and Multi-label Feature Selection method), based on non-negative matrix factorization (NMF) for simultaneously performing feature selection and addressing the aforementioned challenges. Significantly, a major advantage of our research is to exploit the correlation information contained in features, labels and instances to select the relevant features among multiple labels. Furthermore, l2,1 -norm regularization is incorporated in the objective function to undertake feature selection by imposing sparsity on the feature matrix rows. We employ CMFS to decompose the data and multi-label matrices into a low-dimensional space. To solve the objective function, an efficient iterative optimization algorithm is proposed with guaranteed convergence. Finally, extensive experiments are conducted on high-dimensional multi-labeled datasets. The experimental results demonstrate that our method significantly outperforms state-of-the-art multi-label feature selection methods.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have