Abstract

In contingency table analysis, sparse data is frequently encountered for even modest numbers of variables, resulting in non-existence of maximum likelihood estimates. A common solution is to obtain regularized estimates of the parameters of a log-linear model. Bayesian methods provide a coherent approach to regularization, but are often computationally intensive. Conjugate priors ease computational demands, but the conjugate Diaconis–Ylvisaker priors for the parameters of log-linear models do not give rise to closed form credible regions, complicating posterior inference. Here we derive the optimal Gaussian approximation to the posterior for log-linear models with Diaconis–Ylvisaker priors, and provide convergence rate and finite-sample bounds for the Kullback–Leibler divergence between the exact posterior and the optimal Gaussian approximation. We demonstrate empirically in simulations and a real data application that the approximation is highly accurate, even for modest sample sizes. We also propose a method for model selection using the approximation. The proposed approximation provides a computationally scalable approach to regularized estimation and approximate Bayesian inference for log-linear models.

Highlights

  • Contingency table analysis routinely relies on log-linear models, which represent the logarithm of cell probabilities as an additive model (Agresti, 2002)

  • We describe the family of conjugate priors for the natural parameter of an exponential family, referred to as Diaconis–Ylvisaker priors

  • If similar results could be obtained for the posterior in other models, it suggests that the Laplace approximation may not be an appropriate default Gaussian approximation to the posterior

Read more

Summary

Introduction

Contingency table analysis routinely relies on log-linear models, which represent the logarithm of cell probabilities as an additive model (Agresti, 2002). One can place a Gaussian prior on the parameters of a saturated loglinear model to induce Tikhonov type regularization, and perform computation by Markov chain Monte Carlo This approach is well-suited to situations in which the sample size is not tiny relative to the table dimension, but where zero counts exist in some cells. Approximations to the posterior distribution have a long history in Bayesian statistics, with the Laplace approximation perhaps the most common and simple alternative (Tierney and Kadane, 1986; Shun and McCullagh, 1995) More sophisticated approximations, such as those obtained using variational methods (Attias, 1999) may in some cases be more accurate but require computation similar to that for generic EM algorithms. A Matlab implementation of our procedure is available at https://github.com/jamesjohndrow/dynormal-approx

Background
Exponential families
Log-linear models
Conjugate priors for log-linear models
Main results
Application: estimating pairwise dependence
Analysis of pairwise dependence in Rochdale data
Posterior approximation with a 216 table
Discussion
Additional log-linear model details
Proof of Proposition 1
Proof of main results
Findings
Simulations assessing accuracy of approximation
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call