Abstract

Feature selection has been widely recognized as one of the key problems in data mining and machine learning community, especially for high-dimensional data with redundant information, partial noises and outliers. Recently, unsupervised feature selection attracts substantial research attentions since data acquisition is rather cheap today but labeling work is still expensive and time consuming. This is specifically useful for effective feature selection of clustering tasks. Recent works using sparse projection with pre-learned pseudo labels achieve appealing results; however, they generate pseudo labels with all features so that noisy and ineffective features degrade the cluster structure and further harm the performance of feature selection; besides, these methods suffer from complex composition of multiple constraints and computational inefficiency, e.g., eigen-decomposition. Differently, in this work we introduce consensus clustering for pseudo labeling, which gets rid of expensive eigen-decomposition and provides better clustering accuracy with high robustness. In addition, complex constraints such as non-negative are removed due to the crisp indicators of consensus clustering. Specifically, we propose one efficient formulation for our unsupervised feature selection by using the utility function and provide theoretical analysis on optimization rules and model convergence. Extensive experiments on several popular data sets demonstrate that our methods are superior to the most recent state-of-the-art works in terms of NMI.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.