Unsupervised feature selection is an important tool in data mining, machine learning, and pattern recognition. Although data labels are often missing, the number of data classes can be known and exploited in many scenarios. Therefore, a structured graph, whose number of connected components is identical to the number of data classes, has been proposed and is frequently applied in unsupervised feature selection. However, methods based on the structured graph learning face two problems. First, their structured graphs are not always guaranteed to maintain the same number of connected components as the data classes with existing optimization algorithms. Second, they usually lack strategies for choosing moderate hyperparameters. To solve these problems, an efficient and stable unsupervised feature selection method based on a novel structured graph and data discrepancy learning (ESUFS) is proposed. Specifically, the novel structured graph, consisting of a pairwise data similarity matrix and an indicator matrix, can be efficiently learned by solving a discrete optimization problem. Data discrepancy learning focuses on features that maximize the difference among data and helps in selecting discriminative features. Extensive experiments conducted on various datasets show that ESUFS outperforms state-of-the-art methods not only in accuracy (ACC) but also in stability and speed.
Read full abstract