On the Use of Cauchy Prior Distributions for Bayesian Logistic Regression

Yingbo Li,Joyee Ghosh,Robin Mitra

doi:10.1214/17-ba1051

Abstract

In logistic regression, separation occurs when a linear combination of the predictors can perfectly classify part or all of the observations in the sample, and as a result, finite maximum likelihood estimates of the regression coefficients do not exist. Gelman et al. (2008) recommended independent Cauchy distributions as default priors for the regression coefficients in logistic regression, even in the case of separation, and reported posterior modes in their analyses. As the mean does not exist for the Cauchy prior, a natural question is whether the posterior means of the regression coefficients exist under separation. We prove theorems that provide necessary and sufficient conditions for the existence of posterior means under independent Cauchy priors for the logit link and a general family of link functions, including the probit link. We also study the existence of posterior means under multivariate Cauchy priors. For full Bayesian inference, we develop a Gibbs sampler based on Polya-Gamma data augmentation to sample from the posterior distribution under independent Student-t priors including Cauchy priors, and provide a companion R package tglm, available at CRAN. We demonstrate empirically that even when the posterior means of the regression coefficients exist under separation, the magnitude of the posterior samples for Cauchy priors may be unusually large, and the corresponding Gibbs sampler shows extremely slow mixing. While alternative algorithms such as the No-U-Turn Sampler (NUTS) in Stan can greatly improve mixing, in order to resolve the issue of extremely heavy tailed posteriors for Cauchy priors under separation, one would need to consider lighter tailed priors such as normal priors or Student-t priors with degrees of freedom larger than one.

Highlights

In Bayesian linear regression, the choice of prior distribution for the regression coefficients is a key component of the analysis
In doing so we provide further theoretical underpinning of the approach recommended by Gelman et al (2008), and provide further insights on their suggestion of centering the covariates before fitting the regression model, which can have an impact on the existence of posterior means
We have mainly focused on the logistic regression model, which is one of the most widely used binary regression models because of the interpretability of its regression coefficients in terms of odds ratios

Summary

Introduction

In Bayesian linear regression, the choice of prior distribution for the regression coefficients is a key component of the analysis. Gelman et al (2008) recommended using independent Cauchy prior distributions as a default weakly informative choice for the regression coefficients in a logistic regression model, because these heavy tailed priors avoid over-shrinking large coefficients, but provide shrinkage (unlike improper uniform priors) that enables inferences even in the presence of complete separation.

Existence of Posterior Means Under Cauchy Priors

A Brief Review of Separation

Existence of Posterior Means Under Independent Cauchy Priors

Extensions of the Theoretical Result

MCMC Sampling for Logistic Regression

Polya-Gamma Data Augmentation Gibbs Sampler

Simulated Data

Complete Separation with a Solitary Separator

Complete Separation Without Solitary Separators

SPECT Dataset

Pima Indians Diabetes Dataset

Discussion