Explainable Machine Learning Model for Predicting First-Time Acute Exacerbation in Patients with Chronic Obstructive Pulmonary Disease.

Chew-Teng Kor,Yi-Rong Li,Pei-Ru Lin,Sheng-Hao Lin,Bing-Yen Wang,Ching-Hsiung Lin

doi:10.3390/jpm12020228

Abstract

Background: The study developed accurate explainable machine learning (ML) models for predicting first-time acute exacerbation of chronic obstructive pulmonary disease (COPD, AECOPD) at an individual level. Methods: We conducted a retrospective case–control study. A total of 606 patients with COPD were screened for eligibility using registry data from the COPD Pay-for-Performance Program (COPD P4P program) database at Changhua Christian Hospital between January 2017 and December 2019. Recursive feature elimination technology was used to select the optimal subset of features for predicting the occurrence of AECOPD. We developed four ML models to predict first-time AECOPD, and the highest-performing model was applied. Finally, an explainable approach based on ML and the SHapley Additive exPlanations (SHAP) and a local explanation method were used to evaluate the risk of AECOPD and to generate individual explanations of the model’s decisions. Results: The gradient boosting machine (GBM) and support vector machine (SVM) models exhibited superior discrimination ability (area under curve [AUC] = 0.833 [95% confidence interval (CI) 0.745–0.921] and AUC = 0.836 [95% CI 0.757–0.915], respectively). The decision curve analysis indicated that the GBM model exhibited a higher net benefit in distinguishing patients at high risk for AECOPD when the threshold probability was <0.55. The COPD Assessment Test (CAT) and the symptom of wheezing were the two most important features and exhibited the highest SHAP values, followed by monocyte count and white blood cell (WBC) count, coughing, red blood cell (RBC) count, breathing rate, oral long-acting bronchodilator use, chronic pulmonary disease (CPD), systolic blood pressure (SBP), and others. Higher CAT score; monocyte, WBC, and RBC counts; BMI; diastolic blood pressure (DBP); neutrophil-to-lymphocyte ratio; and eosinophil and lymphocyte counts were associated with AECOPD. The presence of symptoms (wheezing, dyspnea, coughing), chronic disease (CPD, congestive heart failure [CHF], sleep disorders, and pneumonia), and use of COPD medications (triple-therapy long-acting bronchodilators, short-acting bronchodilators, oral long-acting bronchodilators, and antibiotics) were also positively associated with AECOPD. A high breathing rate, heart rate, or systolic blood pressure and methylxanthine use were negatively correlated with AECOPD. Conclusions: The ML model was able to accurately assess the risk of AECOPD. The ML model combined with SHAP and the local explanation method were able to provide interpretable and visual explanations of individualized risk predictions, which may assist clinical physicians in understanding the effects of key features in the model and the model’s decision-making process.

Highlights

Chronic obstructive pulmonary disease (COPD) is the third leading cause of mortality worldwide and imposes a substantial burden on health care systems, primarily because of the occurrence of acute exacerbation [1,2]
Of the machine learning (ML) models, the gradient boosting machine (GBM) model achieved the highest performance in this study; we used this model to develop the interpretable ML-based exacerbation risk assessment tool
To compare ML with traditional statistical methods in Acute exacerbation of COPD (AECOPD) risk assessment, Wang et al developed AECOPD prediction models by using traditional logistic regression and several ML algorithms, including random forest (RF), support vector machine (SVM), logistic regression, k-nearest neighbors, and naïve Bayes algorithms, and the results indicated that the ML-based models achieved higher accuracy [21]

Summary

Introduction

Chronic obstructive pulmonary disease (COPD) is the third leading cause of mortality worldwide and imposes a substantial burden on health care systems, primarily because of the occurrence of acute exacerbation [1,2]. The study developed accurate explainable machine learning (ML) models for predicting first-time acute exacerbation of chronic obstructive pulmonary disease (COPD, AECOPD) at an individual level. An explainable approach based on ML and the SHapley Additive exPlanations (SHAP) and a local explanation method were used to evaluate the risk of AECOPD and to generate individual explanations of the model’s decisions. The decision curve analysis indicated that the GBM model exhibited a higher net benefit in distinguishing patients at high risk for AECOPD when the threshold probability was

Methods

Findings

Discussion

Conclusion