A note on auxiliary mixture sampling for Bayesian Poisson models
Abstract Bayesian hierarchical Poisson models are an essential tool for analyzing count data. However, designing efficient algorithms to sample from the posterior distribution of the target parameters remains a challenging task. Auxiliary mixture sampling algorithms have been proposed to this aim. They involve two steps of data augmentation: the first leverages the theory of Poisson processes, and the second approximates the residual distribution of the resulting model through a mixture of Gaussian distributions. In this way, an approximate Gibbs sampler can be implemented. This strategy is particularly beneficial for latent Gaussian models, as it allows one to exploit the sparsity of the precision matrix associated with the random effects and to efficiently incorporate linear constraints. In this paper, we focus on the accuracy of the approximation step, highlighting scenarios where the mixture fails to represent accurately the true underlying distribution, leading to a lack of convergence in the algorithm. We outline key features to monitor, in order to assess if the approximation performs as intended. Building on this, we propose a robust version of the auxiliary mixture sampling algorithm. Our approach includes mechanisms for detecting approximation failures and introduces an enhanced approximation of the right tail of the auxiliary variable distribution, supplemented by a Metropolis-Hastings correction step when needed. Finally, we evaluate the proposed algorithm together with the original mixture sampling algorithms on both simulated and real datasets.
- Research Article
3
- 10.1016/j.jspi.2020.01.007
- Feb 3, 2020
- Journal of Statistical Planning and Inference
On classical and Bayesian asymptotics in stochastic differential equations with random effects having mixture normal distributions
- Research Article
59
- 10.1002/2013wr014372
- Mar 1, 2014
- Water Resources Research
There have been increasing reports of harmful algal blooms (HABs) worldwide. However, the factors that influence cyanobacteria dominance and HAB formation can be site‐specific and idiosyncratic, making prediction challenging. The drivers of cyanobacteria blooms in Lake Paldang, South Korea, the summer climate of which is strongly affected by the East Asian monsoon, may differ from those in well‐studied North American lakes. Using the observational data sampled during the growing season in 2007–2011, a Bayesian hurdle Poisson model was developed to predict cyanobacteria abundance in the lake. The model allowed cyanobacteria absence (zero count) and nonzero cyanobacteria counts to be modeled as functions of different environmental factors. The model predictions demonstrated that the principal factor that determines the success of cyanobacteria was temperature. Combined with high temperature, increased residence time indicated by low outflow rates appeared to increase the probability of cyanobacteria occurrence. A stable water column, represented by low suspended solids, and high temperature were the requirements for high abundance of cyanobacteria. Our model results had management implications; the model can be used to forecast cyanobacteria watch or alert levels probabilistically and develop mitigation strategies of cyanobacteria blooms.
- Research Article
- 10.3389/fpubh.2025.1563392
- Jul 18, 2025
- Frontiers in Public Health
BackgroundInadequate feeding frequency during the early childhood period is responsible for more than two-thirds of global child deaths. Evidence on the rate of daily meal frequency among infants and young children at the national level is crucial for developing targeted interventions to improve feeding practices. Hence, this study aimed to identify factors associated with the rate of daily meal frequency (DMF) among children aged 6–23 months in Ethiopia.MethodsWe retrieved secondary data from the Kids record (KR) of the Ethiopian Mini Demographic and Health Survey (MDHS) dataset. A total of 1,264 children aged 6–23 months were included in the study. A Bayesian hierarchical Poisson model was employed. Model convergence was checked via Rhat, effective sample size, density plots, terrace plots, and autocorrelation plots, and all the results were confirmed. We used the widely applicable information criterion (WAIC) and leave-one-out cross-validation (LOO) for model comparison. The model parameters were estimated via special Markov chain Monte Carlo (MCMC) simulation techniques called Hamiltonian Monte Carlo (HMC) and its extension, the no-U-turn sampler (NUTS). An adjusted incidence rate ratio (AIRR) with a 95% credible interval (CrI) in the multivariable model was used to select variables that had a significant association with the rate of daily meal frequency. The data were analyzed via R software version 4.3.1.ResultsThe mean and standard deviation of the DMF were 3.36 and 1.60, respectively. The rate of DMF was 1.17 times greater (AIRR = 1.17, 95% CrI: 0.997, 1.381) in children whose mothers had a secondary/higher educational level than in those whose mothers had no education. Kids currently being breastfed have a lower rate of DMF (AIRR = 0.88, 95% CI: 0.798, 0.979) by 10% than those who are not currently breastfeeding. Compared with children between the ages of 6–8 months, those between 9 and 11 months (AIRR = 1.55 95% CrI: 1.374, 1.754), 12–17 months (AIRR = 1.72, 95% CrI: 1.543, 1.911), and 18–23 months (AIRR = 95% CrI: 1.90, 1.692, 2.125) had 55, 72 and 90% higher rates of DMF, respectively. In the Afar region (IRR = 0.77, 95% CI: 0.615, 0.982), Somalia (AIRR = 0.83, 95% CrI: 0.682, 1.01), Benishangul (AIRR = 0.8, 95% CrI: 0.639, 0.994), Southern nation nationality and people’s region (SNNPR) (AIRR = 0.73, 95% CrI: 0.596, 0.894), and (AIRR = 0.73, 95% CrI: 0.572, 0.925) decrease the daily meal frequency by 33, 17, 20, 27 and 27%, respectively, compared with that of children from Tigray.Conclusion and recommendationThe rate of DMF was low in Ethiopia and exhibited a significant clustering pattern across the country. These findings stress the need for tailored interventions addressing regional inequities, promoting age-specific nutrition, supporting maternal education, and empowering working women to improve children’s nutritional intake and ensure more equitable access to meals across Ethiopia.
- Research Article
25
- 10.1177/0962280211414853
- Aug 25, 2011
- Statistical Methods in Medical Research
Considerable effort has been devoted to the development of statistical algorithms for the automated monitoring of influenza surveillance data. In this article, we introduce a framework of models for the early detection of the onset of an influenza epidemic which is applicable to different kinds of surveillance data. In particular, the process of the observed cases is modelled via a Bayesian Hierarchical Poisson model in which the intensity parameter is a function of the incidence rate. The key point is to consider this incidence rate as a normal distribution in which both parameters (mean and variance) are modelled differently, depending on whether the system is in an epidemic or non-epidemic phase. To do so, we propose a hidden Markov model in which the transition between both phases is modelled as a function of the epidemic state of the previous week. Different options for modelling the rates are described, including the option of modelling the mean at each phase as autoregressive processes of order 0, 1 or 2. Bayesian inference is carried out to provide the probability of being in an epidemic state at any given moment. The methodology is applied to various influenza data sets. The results indicate that our methods outperform previous approaches in terms of sensitivity, specificity and timeliness.
- Research Article
17
- 10.1080/02640414.2015.1039462
- Apr 28, 2015
- Journal of Sports Sciences
Relative age effect (RAE) in sports has been well documented. Recent studies investigate the effect of birthplace in addition to the RAE. The first objective of this study was to show the magnitude of the RAE in two major professional sports in Japan, baseball and soccer. Second, we examined the birthplace effect and compared its magnitude with that of the RAE. The effect sizes were estimated using a Bayesian hierarchical Poisson model with the number of players as dependent variable. The RAEs were 9.0% and 7.7% per month for soccer and baseball, respectively. These estimates imply that children born in the first month of a school year have about three times greater chance of becoming a professional player than those born in the last month of the year. Over half of the difference in likelihoods of becoming a professional player between birthplaces was accounted for by weather conditions, with the likelihood decreasing by 1% per snow day. An effect of population size was not detected in the data. By investigating different samples, we demonstrated that using quarterly data leads to underestimation and that the age range of sampled athletes should be set carefully.
- Research Article
- 10.1016/0022-4375(83)90029-4
- Sep 1, 1983
- Journal of Safety Research
Relationships between road accidents and hourly traffic flow — II. Probabilistic approach
- Research Article
127
- 10.1016/j.aap.2013.04.025
- May 10, 2013
- Accident Analysis & Prevention
Multi-level Bayesian analyses for single- and multi-vehicle freeway crashes
- Research Article
89
- 10.1016/j.csda.2006.10.006
- Nov 2, 2006
- Computational Statistics & Data Analysis
Auxiliary mixture sampling with applications to logistic models
- Research Article
20
- 10.1002/sim.5457
- Jul 16, 2012
- Statistics in Medicine
In this paper, we investigate the effects of poverty and inequality on the number of HIV-related deaths in 62 New York counties via Bayesian zero-inflated Poisson models that exhibit spatial dependence. We quantify inequality via the Theil index and poverty via the ratios of two Census 2000 variables, the number of people under the poverty line and the number of people for whom poverty status is determined, in each Zip Code Tabulation Area. The purpose of this study was to investigate the effects of inequality and poverty in addition to spatial dependence between neighboring regions on HIV mortality rate, which can lead to improved health resource allocation decisions. In modeling county-specific HIV counts, we propose Bayesian zero-inflated Poisson models whose rates are functions of both covariate and spatial/random effects. To show how the proposed models work, we used three different publicly available data sets: TIGER Shapefiles, Census 2000, and mortality index files. In addition, we introduce parameter estimation issues of Bayesian zero-inflated Poisson models and discuss MCMC method implications.
- Research Article
19
- 10.1177/1471082x14524676
- Aug 26, 2014
- Statistical Modelling
Count data are most commonly modeled using the Poisson model, or by one of its many extensions. Such extensions are needed for a variety of reasons: (1) a hierarchical structure in the data, e.g., due to clustering, the collection of repeated measurements of the outcome, etc.; (2) the occurrence of overdispersion (or underdispersion), meaning that the variability encountered in the data is not equal to the mean, as prescribed by the Poisson distribution; and (3) the occurrence of extra zeros beyond what a Poisson model allows. The first issue is often accommodated through the inclusion of random subject-specific effects. Though not always, one conventionally assumes such random effects to be normally distributed. Overdispersion is often dealt with through a model developed for this purpose, such as, for example, the negative-binomial model for count data. This can be conceived through a random Poisson parameter. Excess zeros are regularly accounted for using so-called zero-inflated models, which combine either a Poisson or negative-binomial model with an atom at zero. The novelty of this article is that it combines all these features. The work builds upon the modelling framework defined by Molenberghs et al. ( 2010 ) in which clustering and overdispersion are accommodated for through two separate sets of random effects in a generalized linear model.
- Research Article
55
- 10.1371/journal.pcbi.0020006
- Feb 1, 2006
- PLoS Computational Biology
Given a collection of fossil sites with data about the taxa that occur in each site, the task in biochronology is to find good estimates for the ages or ordering of sites. We describe a full probabilistic model for fossil data. The parameters of the model are natural: the ordering of the sites, the origination and extinction times for each taxon, and the probabilities of different types of errors. We show that the posterior distributions of these parameters can be estimated reliably by using Markov chain Monte Carlo techniques. The posterior distributions of the model parameters can be used to answer many different questions about the data, including seriation (finding the best ordering of the sites) and outlier detection. We demonstrate the usefulness of the model and estimation method on synthetic data and on real data on large late Cenozoic mammals. As an example, for the sites with large number of occurrences of common genera, our methods give orderings, whose correlation with geochronologic ages is 0.95.
- Research Article
2
- 10.1109/access.2022.3209232
- Jan 1, 2022
- IEEE Access
Due to the increasing injection of intermittent power sources (solar+wind) into a common grid, dispatchable sources such as hydro power should be able to help reduce the variability in load and the variability in generation caused by the intermittent sources. A hydro generator should be able to operate short-term beyond its thermal capability limit. This requires the monitoring of internal temperatures in the hydro generator. In this paper, a thermal model of an air-cooled synchronous generator is presented, emphasizing the various aspects of parameter estimation and identifiability using Bayesian inference. Inferences are drawn from the posterior distributions of the parameters and initial conditions, dispersion (spreading) of particles and sampling efficiency, practical parameter identifiability, and model mismatch with experiments. Results show extremely narrow parameter distributions. It is early to generalize about the posterior distribution of air-related and metal-related parameters of the air-cooled synchronous generator based on the single experimental data presented here.
- Research Article
89
- 10.2136/sssaj2002.1740
- Nov 1, 2002
- Soil Science Society of America Journal
Model nonlinearity and parameter interdependence violate the use of a first‐order approximation to obtain exact confidence intervals of parameters in soil hydrologic models. In this study, the posterior distribution of parameters in soil water retention and hydraulic conductivity functions is examined using observed water retention data and a laboratory transient multistep outflow experiment. Parameter uncertainties obtained with traditional first‐order approximations and uniform grid sampling strategies were compared with those obtained using the Metropolis algorithm, a Markov Chain Monte Carlo (MCMC) sampler. A diagnostic measure, based on multiple sequences generated in parallel, was used to test whether convergence of the Metropolis sampler to the posterior distribution had been achieved. Most significantly, as the Metropolis algorithm can cope with rough response surfaces generated by the objective function used, it not only successfully infers the multivariate posterior probability distribution of the model parameters, but also provides valuable insights in parameter interdependence in the full parameter space.
- Research Article
3
- 10.1016/j.sigpro.2019.02.020
- Feb 19, 2019
- Signal Processing
An augmented sequential MCMC procedure for particle based learning in dynamical systems
- Conference Article
- 10.1109/apct55107.2022.00010
- Jan 1, 2022
Analysis of parameter uncertainty in distributed watershed model is a worldwide challenge. In this study, The Differential Evolution Adaptive Metropolis (DREAM) technique is developed to analyse the uncertainty of Soil and Water Assessment Tool (SWAT) model parameters. SWAT is used for providing the basic hydrologic simulation, DREAM algorithm is employed to approximate the posterior distributions of model parameters with Bayesian inference. DREAM is then used to capture the uncertainty and implications of parameters in the Naryn River Basin (in Central Asia). The posterior distribution of parameters is obtained. Results shows that: (i) the posterior sampling results of DREAM algorithm are satisfactory; (ii) concentrated precipitation during rainy season generates more runoff; (iii) more precipitation exists in the form of snowfall.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.