Abstract

In this paper, we present a novel Local Sensitive Dual Concept Learning (LSDCL) method for the task of unsupervised feature selection. We first reconstruct the original data matrix by the proposed dual concept learning model, which inherits the merit of co-clustering based dual learning mechanism for more interpretable and compact data reconstruction. We then adopt the local sensitive loss function, which emphasizes more on most similar pairs with small errors to better characterize the local structure of data. In this way, our method can select features with better clustering results by more compact data reconstruction and more faithful local structure preserving. An iterative algorithm with convergence guarantee is also developed to find the optimal solution. We fully investigate the performance improvement by the newly developed terms, individually and simultaneously. Extensive experiments on benchmark datasets further show that LSDCL outperforms many state-of-the-art unsupervised feature selection algorithms.

Highlights

  • With the rapid development of data acquisition technology, the huge amounts of high-dimensional data become ubiquitous in a variety of real world applications

  • We propose to reconstruct the data matrix via dual concept learning, where the feature-side and sample-side topic are represented by the non-negative linear combination

  • We introduce the Corr-entropy Induced Metric (CIM) [44] as a generalized metric based on information-theoretic learning (ITL) [45], which can be defined as follows e2

Read more

Summary

INTRODUCTION

With the rapid development of data acquisition technology, the huge amounts of high-dimensional data become ubiquitous in a variety of real world applications. As a data preprocessing strategy, have been proven to be effective and efficient to remove these features and only keep a few relevant and informative features, which reduces the storage and computational cost while avoids significant loss of information or degradation of subsequent learning performance These feature selection algorithms can be broadly classified into supervised, semi-supervised and unsupervised methods according to the availability of supervision. H. Zhao et al.: Local Sensitive Dual Concept Factorization for Unsupervised Feature Selection [22]. The mismatch between the often encountered small errors and the loss function could degrade for the performance of unsupervised feature selection algorithms. We propose to reconstruct the data matrix via dual concept learning, where the feature-side and sample-side topic are represented by the non-negative linear combination. In order not to distract from the reading, proofs of the results are moved to Appendix

RELATED WORKS
CLUSTERING GUIDED METHODS
OTHER RELATED MATRIX FACTORIZATION METHODS
REGULARIZATION WITH LOCAL SENSITIVE STRUCTURE PRESERVING
THE OPTIMIZATION ALGORITHM
UPDATE V GIVEN U AND S
COMPLEXITY
EXPERIMENT
CLUSTERING WITH SELECTED FEATURES
EFFECT OF EACH TERM
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call