Abstract

The rapid rise of non-communicable diseases (NCDs) becomes one of the serious health issues and the leading cause of death worldwide. In recent years, artificial intelligence-based systems have been developed to assist clinicians in decision-making to reduce morbidity and mortality. However, a common drawback of these modern studies is related to explanations of their output. In other words, understanding the inner logic behind the predictions is hidden to the end-user. Thus, clinicians struggle to interpret these models because of their black-box nature, and hence they are not acceptable in the medical practice. To address this problem, we have proposed a Deep Shapley Additive Explanations (DeepSHAP) based deep neural network framework equipped with a feature selection technique for NCDs prediction and explanation among the population in the United States. Our proposed framework comprises three components: First, representative features are done based on the elastic net-based embedded feature selection technique; second a deep neural network classifier is tuned with the hyper-parameters and used to train the model with the selected feature subset; third, two kinds of model explanation are provided by the DeepSHAP approach. Herein, (I) explaining the risk factors that affected the model’s prediction from the population-based perspective; (II) aiming to explain a single instance from the human-centered perspective. The experimental results indicated that the proposed model outperforms various state-of-the-art models. In addition, the proposed model can improve the medical understanding of NCDs diagnosis by providing general insights into the changes in disease risk at the global and local levels. Consequently, DeepSHAP based explainable deep learning framework contributes not only to the medical decision support systems but also can provide to real-world needs in other domains.

Highlights

  • The major contributions of this study are: We propose DeepSHAP based explainable deep learning framework, which is incorporated with a feature selection approach for early prediction of noncommunicable diseases

  • 0.9411, 0.9256 and 0.9074 accuracy scores were achieved by the XGBoost with elastic net (EN), deep neural network (DNN) with sequential backward feature selection with random forest (SBFS-random forest (RF)) and multilayer perceptron (MLP) with SBFS-RF, respectively

  • It is clearly shown that support vector regressionbased recursive feature elimination (SVR-RFE) technique based models performed the lowest accuracy results of 0.8012 by support vector machine (SVM); 0.7905 by k-nearest neighbor (KNN); 0.8315 by RF; 0.8288 by MLP; 0.8485 by XGBoost; and 0.8498 by DNN

Read more

Summary

INTRODUCTION

Ds are the major global health issues confronting humankind. According to the NCDs global status report by the World Health Organization, NCDs are the leading cause of death accounting for 41 million people die each year. In a study [11], authors used a collection of cost-effective time-series features including patient’s comorbidities, cognitive scores, medication history, and demographics to predict Alzheimer’s disease progression using support vector machine (SVM), RF, k-nearest neighbor (KNN), logistic regression, and decision tree techniques. In their results, the early fusion of comorbidity and medication features with other features revealed significant predictive power with all models. A comparison of experimental results is conducted between the proposed framework and state-of-the-art baseline models in validation and test datasets for NCDs. The accuracy, specificity, recall, precision, f-scores and area under the curve (AUC) are exploited to evaluate prediction model performances

RELATED WORK
EXPERIMENTAL DESIGN FOR NONCOMMUNICABLE DISEASES
BASELINE METHODS
EXPERIMENTAL RESULT AND ANALYSIS
Findings
CONCLUSION AND FUTURE WORK
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call