Multilabel feature selection (MFS) has received widespread attention in various big data applications. However, most of the existing methods either explicitly or implicitly assume that all labels are given in advance before feature selection starts; or that all labels are independent. In fact, in many practical applications, the available labels usually arrive dynamically, and they may be interdependent with each other. Moreover, labels may be generated dynamically in a minibatch manner, which makes it more difficult to explore label dependency. In this article, we propose a novel fuzzy mutual information-based multilabel feature selection approach MSDS, which is able to solve single streaming label, minibatch streaming labels, and exploit label dependency simultaneously. In specific, we first promote fuzzy mutual information to be suitable for multilabel learning. This model can effectively consider the relationship between two labels, and has good applicability for measuring the relationship between multiple labels. Then, we analyze feature relevance and feature redundancy based on the combination of label dependency and streaming labels, which helps to facilitate the selection of high-quality feature subsets. Finally, a feature conversion is designed to fuse the representative features of new arrival streaming labels. Comprehensive experiments on twelve multilabel datasets clearly reveal the superiority of the proposed method against two streaming labels based algorithms and five state-of-the-art static label space based algorithms.
Read full abstract