Abstract
Finite mixtures are a flexible modeling tool for irregularly shaped densities and samples from heterogeneous populations. When modeling with mixtures using an exchangeable prior on the component features, the component labels are arbitrary and are indistinguishable in posterior analysis. This makes it impossible to attribute any meaningful interpretation to the marginal posterior distributions of the component features. We propose a model in which a small number of observations are assumed to arise from some of the labeled component densities. The resulting model is not exchangeable, allowing inference on the component features without post-processing. Our method assigns meaning to the component labels at the modeling stage and can be justified as a data-dependent informative prior on the labelings. We show that our method produces interpretable results, often (but not always) similar to those resulting from relabeling algorithms, with the added benefit that the marginal inferences originate directly from a well specified probability model rather than a post hoc manipulation. We provide asymptotic results leading to practical guidelines for model selection that are motivated by maximizing prior information about the class labels and demonstrate our method on real and simulated data.
Highlights
Finite mixture models are flexible tools that are often applied to data from heterogeneous populations or from distributions with irregularly-shaped densities
We introduce a modification to the standard finite mixture model, the anchor model, in which a small number of observations are assumed to be drawn from known component densities
We build on these ideas by formalizing this strategy as a modeling procedure that requires no post-processing of an MCMC sample
Summary
Finite mixture models are flexible tools that are often applied to data from heterogeneous populations or from distributions with irregularly-shaped densities. Much work has been devoted to either preventing or reversing label-switching by placing prior constraints on the parameter space or by post-processing posterior samples in a way that allows only one possible labeling of the mixture components. Because its constraints are not the result of a clearly defined prior specification, it is difficult to evaluate rigorously the underlying structure that the relabeling algorithm imposes upon a problem It is not obvious whether this approach can be justified as a basis for making inferential claims about the posterior distribution of the componentspecific parameters. We introduce a modification to the standard finite mixture model, the anchor model, in which a small number of observations are assumed to be drawn from known component densities This breaks the model’s label invariance in a data-dependent manner while avoiding the strong, subjective restrictions imposed by prior identifiability constraints.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.