Abstract

Feature selection algorithms eliminate irrelevant and redundant features, even the noise, while preserving the most representative features. They can reduce the dimension of the dataset, extract essential features in high dimensional data and improve learning quality. Existing feature selection algorithms are all carried out in data space. However, the information of feature space cannot be fully exploited. To compensate for this drawback, this paper proposes a novel feature selection algorithm for clustering, named self-representation based dual-graph regularized feature selection clustering (DFSC). It adopts the self-representation property that data can be represented by itself. Meanwhile, the local geometrical information of both data space and feature space are preserved simultaneously. By imposing the l2,1-norm constraint on the self-representation coefficients matrix in data space, DFSC can effectively select the most representative features for clustering. We give the objective function, develop iterative updating rules and provide the convergence proof. Two kinds of extensive experiments on some datasets demonstrate the effectiveness of DFSC. Extensive comparisons over several state-of-the-art feature selection algorithms illustrate that additionally considering the information of feature space based on self-representation property improves clustering quality. Meanwhile, because the additional feature selection process can select the most important features to preserve the intrinsic structure of dataset, the proposed algorithm achieves better clustering results compared with some co-clustering algorithms.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.