Abstract

Model-based clustering is a commonly-used technique to partition heterogeneous data into homogeneous groups. When the analysis is to be conducted with a large number of features, analysts face simultaneous challenges in model interpretability, clustering accuracy, and computational efficiency. Several Bayesian and penalization methods have been proposed to select important features for model-based clustering. However, the performance of those methods relies on a careful algorithmic tuning, which can be time-consuming for high-dimensional cases. In this paper, we propose a new sparse clustering method based on alternating hard-thresholding. The new method is conceptually simple and tuning-free. With a user-specified sparsity level, it efficiently detects a set of key features by eliminating a large number of features that are less useful for clustering. Based on the selected key features, one can readily obtain an effective clustering of the original high-dimensional data under a general sparse covariance structure. Under mild conditions, we show that the new method leads to clusters with a misclassification rate consistent to the optimal rate as if the underlying true model were used. The promising performance of the new method is supported by both simulated and real data examples.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call