Modeling non-linear relationships in epidemiological data: The application and interpretation of spline models.

Noah A Schuster,Jos W R Twisk,Judith J M Rijnhart,Martijn W Heymans

doi:10.3389/fepid.2022.975380

Noah A Schuster, Jos W R Twisk + Show 2 more

Open Access

PDF Available

https://doi.org/10.3389/fepid.2022.975380

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Traditional methods to deal with non-linearity in regression analysis often result in loss of information or compromised interpretability of the results. A recommended but underutilized method for modeling non-linear associations in regression models is spline functions. We explain spline functions in a non-mathematical way and illustrate the application and interpretation to an empirical data example. Using data from the Amsterdam Growth and Health Longitudinal Study, we examined the non-linear relationship between the sum of four skinfolds and VO2max, which are measures of body fat and cardiorespiratory fitness, respectively. We compared traditional methods (i.e., quadratic regression and categorization) to spline methods [1- and 3-knot linear spline (LSP) models and a 3-knot restricted cubic spline (RCS) model] in terms of the interpretability of the results and their explained variance (). The spline models fitted the data better than the traditional methods. Increasing the number of knots in the LSP model increased the explained variance (from for the 1-knot model to for the 3-knot model). The RCS model fitted the data best (), but results in regression coefficients that are harder to interpret. Spline functions should be considered more often as they are flexible and can be applied in commonly used regression analysis. RCS regression is generally recommended for prediction research (i.e., to obtain the predicted outcome for a specific exposure value), whereas LSP regression is recommended if one is interested in the effects in a population.

Full Text