Abstract

Mixture modeling is a major paradigm for clustering in statistics. In this article, we develop a new block-wise variable selection method for clustering by exploiting the latent states of the hidden Markov model on variable blocks or the Gaussian mixture model. The variable blocks are formed by depth-first-search on a dendrogram created based on the mutual information between any pair of variables. It is demonstrated that the latent states of the variable blocks together with the mixture model parameters can represent the original data effectively and much more compactly. We thus cluster the data using the latent states and select variables according to the relationship between the states and the clusters. As true class labels are unknown in the unsupervised setting, we first generate more refined clusters, namely, semi-clusters, for variable selection and then determine the final clusters based on the dimension reduced data. Experiments on simulated and real data show that the new method is highly competitive in terms of clustering accuracy compared with several widely used methods. Supplementary materials for this article are available online.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.