The diagnostic performance of AI-based algorithms to discriminate between NMOSD and MS using MRI features: A systematic review and meta-analysis

Masoud Etemadifar,Raheleh Karimi,Mahdi Norouzi,Mehri Salari,Seyyed-Ali Alaei

doi:10.1016/j.msard.2024.105682

Abstract

BackgroundMagnetic resonance imaging [MRI] findings in Neuromyelitis optica spectrum disorder [NMOSD] and Multiple Sclerosis [MS] patients could lead us to discriminate toward them. For instance, U-fiber and Dawson's finger-type lesions are suggestive of MS, however linear ependymal lesions raise the possibility of NMOSD. Recently, artificial intelligence [AI] models have been used to discriminate between NMOSD and MS based on MRI features. In this study, we aim to systematically review the capability of AI algorithms in NMOSD and MS discrimination based on MRI features. MethodWe searched PubMed, Scopus, Web of Sciences, Embase, and IEEE databases up to August 2023. All studies that used AI-based algorithms to discriminate between NMOSD and MS using MRI features were included, without any restriction in time, region, race, and age. Data on NMOSD and MS patients, Aquaporin-4 antibodies [AQP4-Ab] status, diagnosis criteria, performance metrics (accuracy, sensitivity, specificity, and AUC), artificial intelligence paradigm, MR imaging, and used features were extracted. This study is registered with PROSPERO, CRD42023465265. ResultsFifteen studies were included in this systematic review, with sample sizes ranging between 53 and 351. 1,362 MS patients and 1,118 NMOSD patients were included in our systematic review. AQP4-Ab was positive in 94.9% of NMOSD patients in 9 studies. Eight studies used machine learning [ML] as a classifier, while 7 used deep learning [DL]. AI models based on only MRI or MRI and clinical features yielded a pooled accuracy of 82% (95% CI: 78–86%), sensitivity of 83% (95% CI: 79–88%), and specificity of 80% (95% CI: 75–86%). In subgroup analysis, using only MRI features yielded an accuracy, sensitivity, and specificity of 83% (95% CI: 78–88%), 81% (95% CI: 76–87%), and 84% (95% CI: 79–89%), respectively. ConclusionAI models based on MRI features showed a high potential to discriminate between NMOSD and MS. However, heterogeneity in MR imaging, model evaluation, and reporting performance metrics, among other confounders, affected the reliability of our results. Well-designed studies on multicentric datasets, standardized imaging and evaluation protocols, and detailed transparent reporting of results are needed to reach optimal performance.

Full Text