Hierarchical statistical models are important in applied sciences because they capture complex relationships in data, especially when variables are related by space, time, sampling unit, or other shared features. Existing methods for maximum likelihood estimation that rely on Monte Carlo integration over latent variables, such as Monte Carlo Expectation Maximization (MCEM), suffer from drawbacks in efficiency and/or generality. We harness a connection between sampling-stepping iterations for such methods and stochastic gradient descent methods for non-hierarchical models: many noisier steps can do better than few cleaner steps. We call the resulting methods Hierarchical Model Stochastic Gradient Descent (HMSGD) and show that combining efficient, adaptive step-size algorithms with HMSGD yields efficiency gains. We introduce a one-dimensional sampling-based greedy line search for step-size determination. We implement these methods and conduct numerical experiments for a Gamma-Poisson mixture model, a generalized linear mixed models (GLMMs) with single and crossed random effects, and a multi-species ecological occupancy model with over 3000 latent variables. Our experiments show that the accelerated HMSGD methods provide faster convergence than commonly used methods and are robust to reasonable choices of MCMC sample size.
Read full abstract