Bayes factors and the geometry of discrete hierarchical loglinear models

Gérard Letac,Hélène Massam

doi:10.1214/12-aos974

Abstract

A standard tool for model selection in a Bayesian framework is the Bayes factor which compares the marginal likelihood of the data under two given different models. In this paper, we consider the class of hierarchical loglinear models for discrete data given under the form of a contingency table with multinomial sampling. We assume that the prior distribution on the loglinear parameters is the Diaconis–Ylvisaker conjugate prior, and the uniform is the prior distribution on the space of models. Under these conditions, the Bayes factor between two models is a function of the normalizing constants of the prior and posterior distribution of the loglinear parameters. These constants are functions of the hyperparameters $(m,\alpha)$ which can be interpreted, respectively, as the marginal counts and total count of a fictive contingency table. We study the behavior of the Bayes factor when $\alpha$ tends to zero. In this study, the most important tool is the characteristic function $\mathbb{J}_{C}$ of the interior $C$ of the convex hull $\overline{C}$ of the support of the multinomial distribution for a given hierarchical loglinear model. If $h_{C}$ is the support function of $C$, the function $\mathbb{J}_{C}$ is the Laplace transform of $\exp(-h_{C})$. We show that, when $\alpha$ tends to $0$, if the data lies on a face $F_{i}$ of $\overline{C}_{i}$, $i=1,2$, of dimension $k_{i}$, the Bayes factor behaves like $\alpha^{k_{1}-k_{2}}$. This implies in particular that when the data is in $C_{1}$ and in $C_{2}$, that is, when $k_{i}$ equals the dimension of model $J_{i}$, the sparser model is favored, thus confirming the idea of Bayesian regularization. In order to find the faces of $\overline{C}$, we need to know its facets. We show that since here $C$ is a polytope, the denominator of the rational function $\mathbb{J}_{C}$ is the product of the equations of the facets. We also identify a category of facets common to all hierarchical models for discrete variables, not necessarily binary. Finally, we show that these facets are the only facets of $\overline{C}$ when the model is graphical with respect to a decomposable graph.

Full Text