XGBoost-SHAP-based interpretable diagnostic framework for alzheimer’s disease

Fuliang Yi,Hui Yang,Durong Chen,Yao Qin,Hongjuan Han,Jing Cui,Wenlin Bai,Yifei Ma,Rong Zhang,Hongmei Yu

doi:10.1186/s12911-023-02238-9

Fuliang Yi, Hui Yang + Show 8 more

Open Access

https://doi.org/10.1186/s12911-023-02238-9

Copy DOI

Journal: BMC medical informatics and decision making	Publication Date: Jul 25, 2023
Citations: 25	License type: CC BY 4.0

Abstract

BackgroundDue to the class imbalance issue faced when Alzheimer’s disease (AD) develops from normal cognition (NC) to mild cognitive impairment (MCI), present clinical practice is met with challenges regarding the auxiliary diagnosis of AD using machine learning (ML). This leads to low diagnosis performance. We aimed to construct an interpretable framework, extreme gradient boosting-Shapley additive explanations (XGBoost-SHAP), to handle the imbalance among different AD progression statuses at the algorithmic level. We also sought to achieve multiclassification of NC, MCI, and AD.MethodsWe obtained patient data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database, including clinical information, neuropsychological test results, neuroimaging-derived biomarkers, and APOE-ε4 gene statuses. First, three feature selection algorithms were applied, and they were then included in the XGBoost algorithm. Due to the imbalance among the three classes, we changed the sample weight distribution to achieve multiclassification of NC, MCI, and AD. Then, the SHAP method was linked to XGBoost to form an interpretable framework. This framework utilized attribution ideas that quantified the impacts of model predictions into numerical values and analysed them based on their directions and sizes. Subsequently, the top 10 features (optimal subset) were used to simplify the clinical decision-making process, and their performance was compared with that of a random forest (RF), Bagging, AdaBoost, and a naive Bayes (NB) classifier. Finally, the National Alzheimer’s Coordinating Center (NACC) dataset was employed to assess the impact path consistency of the features within the optimal subset.ResultsCompared to the RF, Bagging, AdaBoost, NB and XGBoost (unweighted), the interpretable framework had higher classification performance with accuracy improvements of 0.74%, 0.74%, 1.46%, 13.18%, and 0.83%, respectively. The framework achieved high sensitivity (81.21%/74.85%), specificity (92.18%/89.86%), accuracy (87.57%/80.52%), area under the receiver operating characteristic curve (AUC) (0.91/0.88), positive clinical utility index (0.71/0.56), and negative clinical utility index (0.75/0.68) on the ADNI and NACC datasets, respectively. In the ADNI dataset, the top 10 features were found to have varying associations with the risk of AD onset based on their SHAP values. Specifically, the higher SHAP values of CDRSB, ADAS13, ADAS11, ventricle volume, ADASQ4, and FAQ were associated with higher risks of AD onset. Conversely, the higher SHAP values of LDELTOTAL, mPACCdigit, RAVLT_immediate, and MMSE were associated with lower risks of AD onset. Similar results were found for the NACC dataset.ConclusionsThe proposed interpretable framework contributes to achieving excellent performance in imbalanced AD multiclassification tasks and provides scientific guidance (optimal subset) for clinical decision-making, thereby facilitating disease management and offering new research ideas for optimizing AD prevention and treatment programs.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

XGBoost-SHAP-based interpretable diagnostic framework for alzheimer’s disease

Abstract

Talk to us

Similar Papers

More From: BMC medical informatics and decision making

Lead the way for us

Similar Papers

Author response: Stage-dependent differential influence of metabolic and structural networks on memory across Alzheimer’s disease continuum
Xing Qian ... Pedro Rosa-Neto
-
Xing Qian, et. al.Xing Qian ... Pedro Rosa-Neto
01 Sep 2022
01 Sep 2022

Decision letter: Stage-dependent differential influence of metabolic and structural networks on memory across Alzheimer’s disease continuum
Amy Kuceyeski ... Jeannie Chin
-
Amy Kuceyeski, et. al.Amy Kuceyeski ... Jeannie Chin
13 May 2022
13 May 2022

Search for Clinical Markers Could? Transform Alzheimer's Drug Research
Michele G Sullivan
Caring for the Ages | VOL. 9
Michele G SullivanMichele G Sullivan
01 Jan 2008
Caring for the Ages | VOL. 9

An ontology-based approach for harmonization and cross-cohort query of Alzheimer’s disease data resources
Xubing Hao ... Licong Cui
BMC medical informatics and decision making | VOL. 23
Xubing Hao, et. al.Xubing Hao ... Licong Cui
04 Aug 2023
BMC medical informatics and decision making | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

XGBoost-SHAP-based interpretable diagnostic framework for alzheimer’s disease

Abstract

Talk to us

Similar Papers

More From: BMC medical informatics and decision making