Abstract

Clustering is an important ingredient of unsupervised learning; classical clustering methods include K-means clustering and hierarchical clustering. These methods may suffer from instability because of their tendency prone to sink into the local optimal solutions of the nonconvex optimization model. In this paper, we propose a new convex clustering method for high-dimensional data based on the sparse group lasso penalty, which can simultaneously group observations and eliminate noninformative features. In this method, the number of clusters can be learned from the data instead of being given in advance as a parameter. We theoretically prove that the proposed method has desirable statistical properties, including a finite sample error bound and feature screening consistency. Furthermore, the semiproximal alternating direction method of multipliers is designed to solve the sparse group lasso convex clustering model, and its convergence analysis is established without any conditions. Finally, the effectiveness of the proposed method is thoroughly demonstrated through simulated experiments and real applications.

Highlights

  • Clustering is an important ingredient of unsupervised learning

  • We propose the Sparse Group Lasso Convex Clustering (SGLCC) method by adopting the sparse group lasso penalty [23]. e optimization problem is summarized as min

  • SGLCC is significantly superior to K-means, hierarchical clustering (Hclust), Sclust, and convex clustering (CC) in term of clustering performance, which implies that feature screening is an indispensable part of high-dimensional data analysis

Read more

Summary

Introduction

Clustering is an important ingredient of unsupervised learning. It assigns samples into different clusters by minimizing the differences in the same cluster and maximizing the differences between different clusters. Where A is a given data matrix with n observations and p features; ‖ · ‖F denotes the Frobenius norm of the matrix; c1 ≥ 0 is a tuning parameter that controls the balance between the model fit and the number of clusters;. Is motivates us to propose a more reasonable convex clustering method that can perform cluster analysis and feature screening simultaneously. We propose the Sparse Group Lasso Convex Clustering (SGLCC) method by adopting the sparse group lasso penalty [23]. (3) e experimental results on both synthetic and real datasets illustrate that SGLCC provides superior clustering performance and feature selection abilities to other clustering methods. Conclusions are given in Section 6. e proof of the main results can be found in the Supplementary Materials (available here)

Preliminaries
Statistical Properties
Algorithmic Design
Methods
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.