Abstract
AbstractComputing not the local, but the global optimum of a cluster assignment is one of the important aspects in clustering. Fitting a Gaussian mixture model is a method of soft clustering where optimization of the mixture weights is convex if centroids and bandwidths of the clusters remain unchanged during the updates. The global optimum of the mixture weights is sparse and clustering that utilizes the fitted sparse mixture model is called the convex clustering. To make the convex clustering practical in real applications, the author addresses three types of issues classified as (i) computational inefficiency of the Expectation‐Maximization algorithm, (ii) inconsistency of the bandwidth specifications between clustering and density estimation for high‐dimensional data, and (iii) selection of the optimal clustering from several bandwidth settings. The extremely large number of iterations needed in the Expectation‐Maximization algorithm is significantly reduced with an accurate pruning while choosing a pair of kernels and an element‐wise Newton–Raphson method. For high‐dimensional data, the convex clusterings are performed several times, with initially large bandwidths and succeeding smaller bandwidths. Since the number of clusters cannot be specified precisely in the convex clustering, practitioners often try multiple settings of the initial bandwidths. To choose the optimal clustering from the multiple results, the author proposes an empirical‐Bayes method that can choose appropriate bandwidths if the true clusters are Gaussian. The combination of the repetitions of the convex clusterings and the empirical‐Bayes model selection achieves stable prediction performances compared to the existing mixture learning methods. © 2011 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 5: 70–89, 2012
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Statistical Analysis and Data Mining: The ASA Data Science Journal
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.