Abstract

BackgroundLow-density lipoprotein-cholesterol (LDL-C) is used as a threshold and target for treating dyslipidemia. Although the Friedewald equation is widely used to estimate LDL-C, it has been known to be inaccurate in the case of high triglycerides (TG) or non-fasting states. We aimed to propose a novel method to estimate LDL-C using machine learning. MethodsUsing a large, single-center electronic health record database, we derived a ML algorithm to estimate LDL-C from standard lipid profiles. From 1,029,572 cases with both standard lipid profiles (total cholesterol, high-density lipoprotein-cholesterol, and TG) and direct LDL-C measurements, 823,657 tests were used to derive LDL-C estimation models. Patient characteristics such as sex, age, height, weight, and other laboratory values were additionally used to create separate data sets and algorithms. ResultsMachine learning with gradient boosting (LDL-CX) and neural network (LDL-CN) showed better correlation with directly measured LDL-C, compared with conventional methods (r = 0.9662, 0.9668, 0.9563, 0.9585; for LDL-CX, LDL-CN, Friedewald [LDL-CF], and Martin [LDL-CM] equations, respectively). The overall bias of LDL-CX (−0.27 mg/dL, 95% CI −0.30 to −0.23) and LDL-CN (−0.01 mg/dL, 95% CI -0.04–0.03) were significantly smaller compared with both LDL-CF (−3.80 mg/dL, 95% CI −3.80 to −3.60) or LDL-CM (−2.00 mg/dL, 95% CI −2.00 to −1.94), especially at high TG levels. ConclusionsMachine learning algorithms were superior in estimating LDL-C compared with the conventional Friedewald or the more contemporary Martin equations. Through external validation and modification, machine learning could be incorporated into electronic health records to substitute LDL-C estimation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call