Abstract
We propose a novel spike and slab prior specification with scaled beta prime marginals for the importance parameters of regression coefficients to allow for general effect selection within the class of structured additive distributional regression. This enables us to model effects on all distributional parameters for arbitrary parametric distributions, and to consider various effect types such as non-linear or spatial effects as well as hierarchical regression structures. Our spike and slab prior relies on a parameter expansion that separates blocks of regression coefficients into overall scalar importance parameters and vectors of standardised coefficients. Hence, we can work with a scalar quantity for effect selection instead of a possibly high-dimensional effect vector, which yields improved shrinkage and sampling performance compared to the classical normal-inverse-gamma prior. We investigate the propriety of the posterior, show that the prior yields desirable shrinkage properties, propose a way of eliciting prior parameters and provide efficient Markov Chain Monte Carlo sampling. Using both simulated and three large-scale data sets, we show that our approach is applicable for data with a potentially large number of covariates, multilevel predictors accounting for hierarchically nested data and non-standard response distributions, such as bivariate normal or zero-inflated Poisson.
Highlights
The flexibility of modern regression methodology is both a blessing and a curse for applied researchers and statisticians alike since, on the one hand, added flexibility enables potentially more realistic models approximating the true data generating process but, on the other hand, poses additional challenges in the model building and model checking process
In the simple exponential family framework with only one single regression predictor, the Normal Beta Prime Spike and Slab (NBPSS) prior turns out to be a strong competitor to the parameter-expanded NMIG (peNMIG) prior
Selection of large coefficient blocks such as spatial effects works well for all types of response distributions, while these are problematic with peNMIG due to severe mixing problems
Summary
The flexibility of modern regression methodology is both a blessing and a curse for applied researchers and statisticians alike since, on the one hand, added flexibility enables potentially more realistic models approximating the true data generating process but, on the other hand, poses additional challenges in the model building and model checking process. An analyst is faced with the challenge of choosing an appropriate response distribution, (a task that we will not consider here, see for example Klein et al, 2015c, for practical solutions to this task) and with determining the most appropriate subset of covariates along with their exact modelling alternative for multiple regression predictors. A previous study (Klein et al, 2015a) suggests a bivariate normal model in which the marginal expectations and the marginal scale parameters and the correlation parameter depend on covariates. This leads to a distributional regression model with K = 5 distributional parameters θk ∈ {μ1, μ2, σ1, σ2, ρ}.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.