Abstract

Over the last decades, the challenges in applied regression and in predictive modeling have been changing considerably: (1) More flexible regression model specifications are needed as data sizes and available information are steadily increasing, consequently demanding for more powerful computing infrastructure. (2) Full probabilistic models by means of distributional regression - rather than predicting only some underlying individual quantities from the distributions such as means or expectations - is crucial in many applications. (3) Availability of Bayesian inference has gained in importance both as an appealing framework for regularizing or penalizing complex models and estimation therein as well as a natural alternative to classical frequentist inference. However, while there has been a lot of research on all three challenges and the development of corresponding software packages, a modular software implementation that allows to easily combine all three aspects has not yet been available for the general framework of distributional regression. To fill this gap, the R package bamlss is introduced for Bayesian additive models for location, scale, and shape (and beyond) - with the name reflecting the most important distributional quantities (among others) that can be modeled with the software. At the core of the package are algorithms for highly-efficient Bayesian estimation and inference that can be applied to generalized additive models or generalized additive models for location, scale, and shape, or more general distributional regression models. However, its building blocks are designed as Lego bricks encompassing various distributions (exponential family, Cox, joint models, etc.), regression terms (linear, splines, random effects, tensor products, spatial fields, etc.), and estimators (MCMC, backfitting, gradient boosting, lasso, etc.). It is demonstrated how these can be easily combined to make classical models more flexible or to create new custom models for specific modeling challenges.

Highlights

  • Many modern modeling tasks necessitate flexible regression tools that can deal with: (1) Large data sets that can be both long and/or wide. (2) Probabilistic forecasts that capture the entire distribution and its mean or expectation. (3) Enhanced inference infrastructure, typically Bayesian, broadening classical frequentist methodology

  • At the core of the package are algorithms for highly-efficient Bayesian estimation and inference that can be applied to generalized additive models or generalized additive models for location, scale, and shape, or more general distributional regression models

  • Estimation of model complexity is based on the so-called equivalent degrees of freedom (EDF), i.e., for each model term the trace of the smoother matrix is computed and the total degrees of freedom are approximated by the sum over all distributional parameters and model terms

Read more

Summary

Introduction

Many modern modeling tasks necessitate flexible regression tools that can deal with: (1) Large data sets that can be both long (many observations) and/or wide (many variables or complex effect types). (2) Probabilistic forecasts that capture the entire distribution and its mean or expectation. (3) Enhanced inference infrastructure, typically Bayesian, broadening classical frequentist methodology. To facilitate addressing all challenges and needs simultaneously – independent of a specific estimation strategy and/or fitting algorithm – the bamlss package for the R system for statistical computing (R Core Team 2021) implements a modular “Lego toolbox”, extending the work of Umlauf, Klein, and Zeileis (2018) In this framework the response distribution is a “Lego brick” (as in a classical GLM) or the regression terms (as in a GAM) and the estimation algorithm such as a specific MCMC sampler. Mgcv excels at providing highly-optimized algorithms for general smooth models (Wood, Pya, and Säfken 2016), including inference, as well as the dedicated bam() function for big data that is long and/or wide (Wood, Li, Shaddick, and Augustin 2017) All these packages rely on frequentist estimation strategies. Further details and examples about the bamlss package can be found online at http://www.bamlss.org/

Motivating examples
Basic Bayesian regression
Flexible model terms and estimators
Location-scale model
A flexible Bayesian model framework
Model structure
Posterior estimation
Model choice and evaluation
Evaluation and interpretation
The bamlss package
Sampler Stats & Results
The BAMLSS model frame
Family objects
Estimation engines
Motivation and data
Model specification
Model diagnostics
Predictions and visualizations
Conclusion
Special model terms
Model fitting engines for linear regression
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call