Abstract

Streaming feature selection for unlabeled data aims to remove redundant and irrelevant features from the continuously arriving features without label information. Most existing methods usually focus on selecting a small set of features that approximately reconstruct each sample in the raw data. However, the real-world streaming data may contain irrelevant features which the current reconstruction strategy cannot effectively exclude. These irrelevant features significantly impair the reliability of the selected feature subset. To address this problem, we introduce a dynamic similarity graph to learn the pairwise sample correlations for adaptively evaluating irrelevant features. By virtue of similarity graph diffusion, the unreliable similarities caused by irrelevant features can be gradually eliminated. The past and current diffused graphs are then used to guide feature selection, thus successfully removing redundant and irrelevant features, respectively. The proposed method consists of two stages: 1) minimum redundancy: accepting only features containing new information based on the past diffused graph; 2) maximum relevance: selecting the most relevant features based on the current diffused graph. Additionally, a compound threshold operator is derived to solve the graph-based learning objective. Extensive experiments on real-world data demonstrate that the proposed method outperforms state-of-the-art unsupervised feature selection methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call