The appropriate use of statins plays a vital role in reducing the risk of atherosclerotic cardiovascular disease (ASCVD). However, due to changes in diet and lifestyle, there has been a significant increase in the number of individuals with high cholesterol levels. Therefore, it is crucial to ensure the rational use of statins. Adverse reactions associated with statins, including liver enzyme abnormalities and statin-associated muscle symptoms (SAMS), have impacted their widespread utilization. In this study, we aimed to develop a predictive model for statin efficacy and safety based on real-world clinical data using machine learning techniques. We employed various data preprocessing techniques, such as improved random forest imputation and Borderline SMOTE oversampling, to handle the dataset. Boruta method was utilized for feature selection, and the dataset was divided into training and testing sets in a 7:3 ratio. Five algorithms, including logistic regression, naive Bayes, decision tree, random forest, and gradient boosting decision tree, were used to construct the predictive models. Ten-fold cross-validation and bootstrapping sampling were performed for internal and external validation. Additionally, SHAP (SHapley Additive exPlanations) was employed for feature interpretability. Ultimately, an accessible web-based platform for predicting statin efficacy and safety was established based on the optimal predictive model. The random forest algorithm exhibited the best performance among the five algorithms. The predictive models for LDL-C target attainment (AUC = 0.883, Accuracy = 0.868, Precision = 0.858, Recall = 0.863, F1 = 0.860, AUPRC = 0.906, MCC = 0.761), liver enzyme abnormalities (AUC = 0.964, Accuracy = 0.964, Precision = 0.967, Recall = 0.963, F1 = 0.965, AUPRC = 0.978, MCC = 0.938), and muscle pain/Creatine kinase (CK) abnormalities (AUC = 0.981, Accuracy = 0.980, Precision = 0.987, Recall = 0.975, F1 = 0.981, AUPRC = 0.987, MCC = 0.965) demonstrated favorable performance. The most important features of LDL-C target attainment prediction model was cerebral infarction, TG, PLT and HDL. The most important features of liver enzyme abnormalities model was CRP, CK and number of oral medications. Similarly, AST, ALT, PLT and number of oral medications were found to be important features for muscle pain/CK abnormalities. Based on the best-performing predictive model, a user-friendly web application was designed and implemented. This study presented a machine learning-based predictive model for statin efficacy and safety. The platform developed can assist in guiding statin therapy decisions and optimizing treatment strategies. Further research and application of the model are warranted to improve the utilization of statin therapy.
Read full abstract