Chronic Obstructive Pulmonary Disease (COPD) is a heterogeneous disease with a variety of symptoms including, persistent coughing and mucus production, shortness of breath, wheezing, and chest tightness. As the disease advances, exacerbations, i.e. acute worsening of respiratory symptoms, may increase in frequency, leading to potentially life-threatening complications. Exposure to air pollutants may trigger COPD exacerbations. Literature predictive models for COPD exacerbations, while promising, may be constrained by their reliance on fixed air quality sensor data that may not fully capture individuals’ dynamic exposure to air pollution. To address this, we designed a machine learning (ML) framework that leverages data from personal air quality monitors, health records, lifestyle, and living condition information to build models that perform short-term prediction of COPD exacerbations. The framework employs (i) k-means clustering to uncover potentially distinct patient sub-types, (ii) supervised ML techniques (Logistic Regression, Random Forest, and eXtreme Gradient Boosting) to train and test predictive models for each patient sub-type and (iii) an explainable artificial intelligence technique (SHAP) to interpret the final models. The framework was tested on data collected in 101 COPD patients monitored for up to 6 months with occurrence of exacerbation in 10.7% of total samples. Two different patient sub-types have been identified, characterised by different disease severity. The best performing models were Random Forest in cluster 1, with area under the receiver operating characteristic curve (AUC) of 0.90, and area under the precision/recall curve (AUPRC) of 0.7; and Random Forest model in cluster 2, with AUC of 0.82 and AUPRC of 0.56. The model interpretability analysis identified previous symptoms and cumulative pollutant exposure as key predictors of exacerbations. The results of our study set a premise for a predictive framework in COPD exacerbations, particularly investigating the potential influence of environmental features. The SHAP analysis revealed that the contribution of environmental features is not uniform across all subjects. For instance, cumulative exposure to pollutants demonstrated greater predictive power in cluster 1. The SHAP analysis also shown that overall clinical factors and individual symptomatology play the most significant role in this setup to determine exacerbation risk.
Read full abstract