Machine learning application for predicting smoking cessation among US adults: An analysis of waves 1-3 of the PATH study.

Mona Issabakhsh,David Mendez,Luz Maria Sánchez-Romero,Alex C Liber,Yameng Li,Jiale Tan,Rafael Meza,David T Levy,Thuy T T Le

doi:10.1371/journal.pone.0286883

Abstract

Identifying determinants of smoking cessation is critical for developing optimal cessation treatments and interventions. Machine learning (ML) is becoming more prevalent for smoking cessation success prediction in treatment programs. However, only individuals with an intention to quit smoking cigarettes participate in such programs, which limits the generalizability of the results. This study applies data from the Population Assessment of Tobacco and Health (PATH), a United States longitudinal nationally representative survey, to select primary determinants of smoking cessation and to train ML classification models for predicting smoking cessation among the general population. An analytical sample of 9,281 adult current established smokers from the PATH survey wave 1 was used to develop classification models to predict smoking cessation by wave 2. Random forest and gradient boosting machines were applied for variable selection, and the SHapley Additive explanation method was used to show the effect direction of the top-ranked variables. The final model predicted wave 2 smoking cessation for current established smokers in wave 1 with an accuracy of 72% in the test dataset. The validation results showed that a similar model could predict wave 3 smoking cessation of wave 2 smokers with an accuracy of 70%. Our analysis indicated that more past 30 days e-cigarette use at the time of quitting, fewer past 30 days cigarette use before quitting, ages older than 18 at smoking initiation, fewer years of smoking, poly tobacco past 30-days use before quitting, and higher BMI resulted in higher chances of cigarette cessation for adult smokers in the US.

Full Text