Abstract

Clinical evaluation of systemic lupus erythematosus (SLE) disease activity is limited and inconsistent, and high disease activity significantly, seriously impacts on SLE patients. This study aims to generate a machine learning model to identify SLE patients with high disease activity. A total of 1014 SLE patients with low disease activity and 453 SLE patients with high disease activity were included. A total of 94 clinical, laboratory data and 17 meteorological indicators were collected. After data preprocessing, we use mutual information and multisurf to evaluate and select the importance of features. The selected features are used for machine learning modeling. Performance of the model is evaluated and verified by a series of binary classification indicators. We screened out hematuria, proteinuria, pyuria, low complement, precipitation, sunlight and other features for model construction by integrated feature selection. After hyperparameter optimization, the LGB has the best performance (ROC: AUC = 0.930; PRC: AUC = 0.911, APS = 0.913; balance accuracy: 0.856), and the worst is the naive bayes (ROC: AUC = 0.849; PRC: AUC = 0.719, APS = 0.714; balance accuracy: 0.705). Finally, the selection of features has good consistency in the composite feature importance bar plot. We identify SLE patients with high disease activity by a simple machine learning pipeline, especially the LGB model based on the characteristics of proteinuria, hematuria, pyuria and other feathers screened out by collective feature selection.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call