Abstract Background Multiple sclerosis (MS) is a chronic autoimmune disease of the central nervous system (CNS) characterized by inflammation, demyelination, gliosis, and neuronal loss. Globally, over 2.8 million people have MS, and 300 people receive a diagnosis of MS every day. It is 3 times more common in females than males. Autoimmunity, environmental factors, Epstein-Barr virus, and genetics are implicated in the etiology of MS. The pathophysiology of MS is limited to the central nervous system, resulting in focal inflammation of the blood-brain barrier and neurodegeneration of the axons, neurons, and synapse. Diagnosis is made based on clinical history, physical examination, imaging, and cerebrospinal fluid (CSF) studies. Early diagnosis is key to controlling the progression of the disease using disease-modifying therapies. Methods We trained machine learning models with over 3000 sets of patient data with and without MS. The data was sourced from MIMIC-IV, a hospital-wide electronic health record (EHR) from Beth Israel Deaconess Medical Center, Boston, MA. A random forest model with 300 trees with a maximal depth of 30 layers and gain ratio was used as the criterion for selecting the attributes for splitting. The model was tested with 10-fold cross-validation. This model produced optimal performance. The input parameters consisted of age, gender, and the results of routine blood markers, such as complete blood counts, differential counts, comprehensive metabolic panels, and lipid panels recorded up to 3 years before the diagnosis of MS was identified. An evaluation of the performance was conducted using the area under the receiver operating characteristic curve (AUC). Results We were able to show that the model could predict the risk for MS, with an AUC of 0.98 and accuracy of 0.92. This gave the model 0.91 sensitivity, 0.92 specificity, 0.92 positive predictive value, and 0.91 negative predictive value. Neutrophils, lymphocytes, monocytes, eosinophils, red blood cell counts, hemoglobin, and hematocrit seemed predominantly to contribute to the identification of risk for MS. Prediction accuracy was consistent up to 3 years prior to diagnosis. Conclusion Studies have shown that five types of immune cells (plasma cells, monocytes, macrophage M2, neutrophils, and eosinophils) in cerebrospinal fluid (CSF) were significantly altered in MS cases compared to the control group. Brain shrinkage in patients with MS may be linked to hemoglobin protein in the blood leaked through the blood-brain barrier. Albumin, the most abundant protein in plasma, gains access to CNS tissue, where it is exposed to an inflammatory milieu and tissue damage, e.g., demyelination. Our model identified the pattern of the combined values of these markers, contributing to the prediction. Thus AI/ML-based prediction models may be able to help identify the risk for MS years before neurological symptoms appear. This may help to prompt close monitoring of these patients for periodic neurological and cognitive exams as soon as the first symptoms appear. Early confirmation of the diagnosis with imaging and CSF studies may ensure prompt consideration for disease-modifying therapies.
Read full abstract