Abstract

Background: Low-density lipoprotein cholesterol (LDL-C) is commonly estimated from total cholesterol (TC), high-density lipoprotein cholesterol (HDL-C) and triglycerides (TG) using predefined equations which assume fixed or varying relationships between these parameters and may under- or overestimate LDL-C. Machine learning (ML) algorithms allow prediction of complex non-linear relationships. We, therefore, sought to investigate the utility of ML to predict LDL-C in comparison with two widely-used methods (Friedewald and Hopkins). Methods: We identified 7397 direct LDL-C (4716 HIV, 2060 uninfected controls) measurements in the Women's Interagency HIV Study (WIHS), a prospective study of women with and without HIV undergoing serial assessments. We trained and optimized 5 ML methods (linear regression, random forest, gradient boosting machine, support vector machine, and neural networks) to predict LDL-C using TC, HDL-C, and TG in 80% of the measurements and tested model performance in a holdout test set (20% of the measurements). Results: Overall, the support vector machine model had the best performance characteristics, outperforming Friedewald’s and Hopkins methods with higher R 2 , lower root mean square error and lower mean absolute error. The support vector machine model performance remained superior in participant subgroups with and without HIV and those with non-fasting measurements. Model performance parameters for the test dataset are shown in the Figure. Conclusions: A support vector machine learning model predicts directly measured LDL-C more accurately than Friedewald and Hopkins methods, especially in non-fasting patients with HIV. Further studies are needed to provide external validation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call