An Online Unsupervised Streaming Features Selection Through Dynamic Feature Clustering

Xuyang Yan,Benjamin Lartey,Mrinmoy Sarkar,Kishor Datta Gupta,Abdollah Homaifar

doi:10.1109/tai.2022.3196637

Abstract

Streaming feature selection (SFS) is emerging as a key research direction which addresses the non-stationary property of feature streams when the sample size is fixed. Most existing SFS techniques are supervised methods, and ignore the label scarcity. Real-world datasets are typically unlabeled and the labeling costs are expensive. Although some unsupervised SFS approaches are proposed, these approaches are either limited to the homogeneous feature types or require substantial computational complexity. To address these problems, we propose an online unsupervised feature selection framework using dynamic feature clustering in this paper. We derived a recursive density lower bound to estimate the density distribution of feature streams and developed a density-based dynamic clustering method to perform the online feature stream clustering for exploring feature redundancy. An unsupervised online feature relevance maximization and redundancy minimization strategy is introduced to extract a subset of important features with low redundancy from the feature stream. Experimental results on thirteen well-known benchmark datasets and comparison studies with seven state-of-the-art supervised SFS methods demonstrate that the proposed unsupervised method provides statistically comparable performance with the supervised SFS techniques while the label information is unknown.

Full Text