Rapid development for modeling big data requires effective and efficient methods for estimating the parameters involved. Although several accelerated Expectation-Maximization algorithms have been developed, there still exist two major concerns: reducing computational cost and improving model estimation accuracy. We propose three distributed-like algorithms for multivariate Gaussian mixture models, which can accelerate speed and improve estimation accuracy. The first algorithm is distributed algorithm, which is used to speed up the calculation of classic algorithms and improve its estimation accuracy by averaging the one-step estimators obtained from distributed operators. The second algorithm is distributed online algorithm, which is a distributed stochastic approximation procedure that performs online updates when reading online data. The final algorithm is called distributed monotonically over-relaxed algorithm, which uses an over-relaxation factor and a distributing strategy to improve the estimation accuracy of multivariate Gaussian mixture models. We investigate the stability, sensitivity, convergence, and robustness of these algorithms in a numerical study. We also apply these algorithms to three real data sets for validation.
Read full abstract