Abstract

Administrative health datasets are widely used in public health research but often lack information about common confounders. We aimed to develop and validate machine learning (ML)-based models using medication data from Australia's Pharmaceutical Benefits Scheme (PBS) database to predict obesity and smoking. We used data from the D-Health Trial (N=18 000) and the QSkin Study (N=43 794). Smoking history, and height and weight were self-reported at study entry. Linkage to the PBS dataset captured 5 years of medication data after cohort entry. We used age, sex, and medication use, classified using anatomical therapeutic classification codes, as potential predictors of smoking (current or quit <10 years ago; never or quit ≥10 years ago) and obesity (obese; non-obese). We trained gradient-boosted machine learning models using data for the first 80% of participants enrolled; models were validated using the remaining 20%. We assessed model performance overall and by sex and age, and compared models generated using 3 and 5 years of PBS data. Based on the validation dataset using 3 years of PBS data, the area under the receiver operating characteristic curve was 0.70 (95% confidence interval [CI] 0.68-0.71) for predicting obesity and 0.71 (95% CI 0.70-0.72) for predicting smoking. Models performed better in women than in men. Using 5 years of PBS data resulted in marginal improvement. Medication data in combination with age and sex can be used to predict obesity and smoking. These models may be of value to researchers using data collected for administrative purposes.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.