Abstract
ABSTRACTThe two main topics of this article are the introduction of the “optimally tuned robust improper maximum likelihood estimator” (OTRIMLE) for robust clustering based on the multivariate Gaussian model for clusters, and a comprehensive simulation study comparing the OTRIMLE to maximum likelihood in Gaussian mixtures with and without noise component, mixtures of t-distributions, and the TCLUST approach for trimmed clustering. The OTRIMLE uses an improper constant density for modeling outliers and noise. This can be chosen optimally so that the nonnoise part of the data looks as close to a Gaussian mixture as possible. Some deviation from Gaussianity can be traded in for lowering the estimated noise proportion. Covariance matrix constraints and computation of the OTRIMLE are also treated. In the simulation study, all methods are confronted with setups in which their model assumptions are not exactly fulfilled, and to evaluate the experiments in a standardized way by misclassification rates, a new model-based definition of “true clusters” is introduced that deviates from the usual identification of mixture components with clusters. In the study, every method turns out to be superior for one or more setups, but the OTRIMLE achieves the most satisfactory overall performance. The methods are also applied to two real datasets, one without and one with known “true” clusters. Supplementary materials for this article are available online.
Highlights
We introduce and investigate the “optimally tuned robust improper maximum likelihood estimator” (OTRIMLE), a method for robust clustering with clusters that can be approximated by multivariate Gaussian distributions
We present a simulation study comparing optimally tuned RIMLE (OTRIMLE) and other approaches for modelbased clustering, which is, to our knowledge, the most comprehensive study in the field and involves a careful discussion of the issue of comparing methods based on different model assumptions
Despite our effort to make the simulation study fair, it would be good to have comparisons of methods run by researchers who did not have their hand in the design of any of the methods
Summary
We introduce and investigate the “optimally tuned robust improper maximum likelihood estimator” (OTRIMLE), a method for robust clustering with clusters that can be approximated by multivariate Gaussian distributions. The basic idea of OTRIMLE is to fit an improper density to the data that is made up by a Gaussian mixture density and a “pseudo mixture component” defined by a small constant density, which is meant to capture outliers in low density areas of the data. This is inspired by the addition of a uniform “noise component” to a Gaussian mixture (Banfield and Raftery 1993). The OTRIMLE has been found to work well for one-dimensional data in Coretto and Hennig (2010)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.