Abstract

Penalised regression methods are a useful atheoretical approach for both developing predictive models and selecting key indicators within an often substantially larger pool of available indicators. In comparison to traditional methods, penalised regression models improve prediction in new data by shrinking the size of coefficients and retaining those with coefficients greater than zero. However, the performance and selection of indicators depends on the specific algorithm implemented. The purpose of this study was to examine the predictive performance and feature (i.e., indicator) selection capability of common penalised logistic regression methods (LASSO, adaptive LASSO, and elastic-net), compared with traditional logistic regression and forward selection methods. Data were drawn from the Australian Temperament Project, a multigenerational longitudinal study established in 1983. The analytic sample consisted of 1,292 (707 women) participants. A total of 102 adolescent psychosocial and contextual indicators were available to predict young adult daily smoking. Penalised logistic regression methods showed small improvements in predictive performance over logistic regression and forward selection. However, no single penalised logistic regression model outperformed the others. Elastic-net models selected more indicators than either LASSO or adaptive LASSO. Additionally, more regularised models included fewer indicators, yet had comparable predictive performance. Forward selection methods dismissed many indicators identified as important in the penalised logistic regression models. Although overall predictive accuracy was only marginally better with penalised logistic regression methods, benefits were most clear in their capacity to select a manageable subset of indicators. Preference to competing penalised logistic regression methods may therefore be guided by feature selection capability, and thus interpretative considerations, rather than predictive performance alone.

Highlights

  • Maximising prediction of health outcomes in populations is central to good public health practice and policy development

  • Overall predictive accuracy was only marginally better with penalised logistic regression methods, benefits were most clear in their capacity to select a manageable subset of indicators

  • Preference to competing penalised logistic regression methods may be guided by feature selection capability, and interpretative considerations, rather than predictive performance alone

Read more

Summary

Introduction

Maximising prediction of health outcomes in populations is central to good public health practice and policy development. Population cohort studies have the potential to provide such evidence due to the collection of a wide range of developmentally appropriate psychosocial and contextual information on individuals over extended periods of time. It is challenging to identify which indicators maximise prediction, when a large number of potential indicators is available. Growing accessibility of predictive modelling tools is encouraging and presents as a potential point of advancement for identifying key predictive indicators relevant to population health [1]. Penalised regression methods are a useful atheoretical approach for both developing predictive models and selecting key indicators within an often substantially larger pool of available indicators. The purpose of this study was to examine the predictive performance and feature (i.e., indicator) selection capability of common penalised logistic regression methods (LASSO, adaptive LASSO, and elastic-net), compared with traditional logistic regression and forward selection methods

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call