Background: In the U.S., about 8% of adults never received cholesterol screening. Although machine learning (ML) has been used to develop decision tools for Atherosclerotic Cardiovascular Disease (ASCVD) risk prediction, its application in behavioral forecasting has not yet been explored in the context of cholesterol screening behaviors. This study aimed to examine the performance and accuracy of ML algorithms in forecasting cholesterol screening behaviors in adults after age 50. Methods: This analysis used deidentified data from the Health and Retirement Study (HRS) 2004-2018. HRS is a longitudinal survey among 23,000 households in the U.S. Participants were excluded from the current analysis if they passed away by 2019, ever had ASCVD or stroke, were under age 50 at baseline, or had missing data in self-reported cholesterol screening. In total, 7176 participants (mean age [SD]=62 [8]) met the inclusion criteria; participants were randomly split into a training set (80%) and a testing set (20%). The synthetic minority oversampling technique was used to solve the imbalance distribution of the rare event. Five ML algorithms were used: random forest, gradient boosting machine (GBM), XGBoost, Support Vector Machine (SVM), and logistic regression. Accuracy, AUROC, and positive predictive value (PPV) were used to compare model performance. The average gain was evaluated for feature importance in the demographic and health domains. Results: In total, 232 (3.2%) respondents did not receive any cholesterol screening from 2008 to 2018. Experiments with five ML algorithms suggested that XGBoost with deeper trees and learning rate performed better in classifying those who did not screen for cholesterol levels over 10 years. Adding prior cholesterol screening history (2004-2006) into the model significantly improved model performance. Hypertension, self-rated health, and smoking were the major health features, while insurance, poverty, and work status were the major demographic features in the predictive model (accuracy=0.97; AUROC=0.88; PPV=0.42). Conclusion: Findings underscore the potential utility of ML models in predicting cholesterol screening behaviors after age 50. This could be the basis for developing decision tools for clinicians to identify those with a lower chance of cholesterol screening or make reminders accordingly. The low-cost predictive model might improve the uptake of preventive screening behaviors in middle-aged and older adults.
Read full abstract