Abstract

Setting initial values of parameters of mixture distributions estimated by using the EM recursive algorithm is very important to the overall quality of estimation. None of the existing methods are suitable for heteroscedastic mixtures with a large number of components. We present relevant novel methodology of estimating the initial values of parameters of univariate, heteroscedastic Gaussian mixtures, on the basis of dynamic programming partitioning of the range of observations into bins. We evaluate variants of the dynamic programming method corresponding to different scoring functions for partitioning. We demonstrate the superior efficiency of the proposed method compared to existing techniques for both simulated and real datasets.

Highlights

  • A problem of crucial importance in applications of the expectation maximization (EM) recursive algorithm [McLachlan and Peel (2000)] for fitting normal mixture models to data is the choice of initial values for mixture parameters

  • The aim of this study is to develop and evaluate a method for estimating initial values of parameters for EM iterations for univariate, multi-component, heteroscedastic Gaussian mixtures based on dynamic programming partitioning

  • 1850012-9 two groups of computational experiments, artificially created data and proteomic mass spectral data, and we report the results of comparisons of different methods for setting initial conditions for EM iterations

Read more

Summary

Introduction

A problem of crucial importance in applications of the expectation maximization (EM) recursive algorithm [McLachlan and Peel (2000)] for fitting normal mixture models to data is the choice of initial values for mixture parameters. Approaches to initializing EM iterations have been extensively discussed and studied [McLachlan and Peel (2000); Karlis and Xekalaki (2003); Biernacki et al (2003); Biernacki (2004); Maitra (2009); O’Hagan et al (2012); Melnykov and Melnykov (2012)]. A simple approach is random initialization involving the generation of initial values of parameters and component weights (mixing proportions) by using assumed probability distributions [McLachlan and Peel (2000)]. Another simple idea is using data quantiles to estimate initial means and variances of components to start the EM iterations. A group of approaches involve using some kind of clustering procedure (hierarchical clustering or k-means clustering) applied for the data set to compute initial parameters for EM iterations [Biernacki et al (2003); Maitra (2009); O’Hagan et al (2012)]. Available software packages for mixture modeling [McLachlan and Peel (1999); Biernacki et al (2006); Fraley and Raftery (1999); Richardson and Green (1997)] offer different possibilities for setting initial conditions for EM iterations

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call