Bamlss: A Lego Toolbox for Flexible Bayesian Regression (and Beyond)

Nadja Klein,Thorsten Simon,Nikolaus Umlauf,Achim Zeileis

doi:10.18637/jss.v100.i04

Nadja Klein, Thorsten Simon + Show 2 more

Open Access

https://doi.org/10.18637/jss.v100.i04

Copy DOI

Abstract

Over the last decades, the challenges in applied regression and in predictive modeling have been changing considerably: (1) More flexible regression model specifications are needed as data sizes and available information are steadily increasing, consequently demanding for more powerful computing infrastructure. (2) Full probabilistic models by means of distributional regression - rather than predicting only some underlying individual quantities from the distributions such as means or expectations - is crucial in many applications. (3) Availability of Bayesian inference has gained in importance both as an appealing framework for regularizing or penalizing complex models and estimation therein as well as a natural alternative to classical frequentist inference. However, while there has been a lot of research on all three challenges and the development of corresponding software packages, a modular software implementation that allows to easily combine all three aspects has not yet been available for the general framework of distributional regression. To fill this gap, the R package bamlss is introduced for Bayesian additive models for location, scale, and shape (and beyond) - with the name reflecting the most important distributional quantities (among others) that can be modeled with the software. At the core of the package are algorithms for highly-efficient Bayesian estimation and inference that can be applied to generalized additive models or generalized additive models for location, scale, and shape, or more general distributional regression models. However, its building blocks are designed as Lego bricks encompassing various distributions (exponential family, Cox, joint models, etc.), regression terms (linear, splines, random effects, tensor products, spatial fields, etc.), and estimators (MCMC, backfitting, gradient boosting, lasso, etc.). It is demonstrated how these can be easily combined to make classical models more flexible or to create new custom models for specific modeling challenges.

Highlights

Many modern modeling tasks necessitate flexible regression tools that can deal with: (1) Large data sets that can be both long and/or wide. (2) Probabilistic forecasts that capture the entire distribution and its mean or expectation. (3) Enhanced inference infrastructure, typically Bayesian, broadening classical frequentist methodology
At the core of the package are algorithms for highly-efficient Bayesian estimation and inference that can be applied to generalized additive models or generalized additive models for location, scale, and shape, or more general distributional regression models
Estimation of model complexity is based on the so-called equivalent degrees of freedom (EDF), i.e., for each model term the trace of the smoother matrix is computed and the total degrees of freedom are approximated by the sum over all distributional parameters and model terms

Summary

Introduction

Many modern modeling tasks necessitate flexible regression tools that can deal with: (1) Large data sets that can be both long (many observations) and/or wide (many variables or complex effect types). (2) Probabilistic forecasts that capture the entire distribution and its mean or expectation. (3) Enhanced inference infrastructure, typically Bayesian, broadening classical frequentist methodology. To facilitate addressing all challenges and needs simultaneously – independent of a specific estimation strategy and/or fitting algorithm – the bamlss package for the R system for statistical computing (R Core Team 2021) implements a modular “Lego toolbox”, extending the work of Umlauf, Klein, and Zeileis (2018) In this framework the response distribution is a “Lego brick” (as in a classical GLM) or the regression terms (as in a GAM) and the estimation algorithm such as a specific MCMC sampler. Mgcv excels at providing highly-optimized algorithms for general smooth models (Wood, Pya, and Säfken 2016), including inference, as well as the dedicated bam() function for big data that is long and/or wide (Wood, Li, Shaddick, and Augustin 2017) All these packages rely on frequentist estimation strategies. Further details and examples about the bamlss package can be found online at http://www.bamlss.org/

Motivating examples

Basic Bayesian regression

Flexible model terms and estimators

Location-scale model

A flexible Bayesian model framework

Model structure

Posterior estimation

Model choice and evaluation

Evaluation and interpretation

The bamlss package

Sampler Stats & Results

The BAMLSS model frame

Family objects

Estimation engines

Motivation and data

Model specification

Model diagnostics

Predictions and visualizations

Conclusion

Special model terms

Model fitting engines for linear regression

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Statistical Software	Publication Date: Jan 1, 2021
Citations: 8	License type: cc-by

R Discovery Prime

R Discovery Prime

Bamlss: A Lego Toolbox for Flexible Bayesian Regression (and Beyond)

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Statistical Software

Lead the way for us

Similar Papers

Boosting functional response models for location, scale and shape with an application to bacterial competition
Sophia Anna Schaffer ... Madeleine Opitz
Statistical Modelling | VOL. 21
Sophia Anna Schaffer, et. al.Sophia Anna Schaffer ... Madeleine Opitz
10 Jun 2020
Statistical Modelling | VOL. 21

Prediction performance of linear models and gradient boosting machine on complex phenotypes in outbred mice.
Mario P L Calus ... Gary A Churchill
G3 (Bethesda, Md.) | VOL. 12
Mario P L Calus, et. al.Mario P L Calus ... Gary A Churchill
15 Feb 2022
G3 (Bethesda, Md.) | VOL. 12

Interpretable machine learning with an ensemble of gradient boosting machines
Andrei V Konstantinov ... Lev V Utkin
Knowledge-Based Systems | VOL. 222
Andrei V Konstantinov, et. al.Andrei V Konstantinov ... Lev V Utkin
26 Mar 2021
Knowledge-Based Systems | VOL. 222

R∗: A Robust MCMC Convergence Diagnostic with Uncertainty Using Decision Tree Classifiers
Aki Vehtari ... Ben Lambert
Bayesian Analysis | VOL. 17
Aki Vehtari, et. al.Aki Vehtari ... Ben Lambert
30 Dec 2020
Bayesian Analysis | VOL. 17

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Bamlss: A Lego Toolbox for Flexible Bayesian Regression (and Beyond)

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Statistical Software