Development of childhood asthma prediction models using machine learning approaches.

Dilini M Kothalawala,Faisal I Rezwan,John W Holloway,Adnan Custovic,S Hasan Arshad,Angela Simpson,Clare S Murray,William J Tapper

doi:10.1002/clt2.12076

Abstract

Respiratory symptoms are common in early life and often transient. It is difficult to identify in which children these will persist and result in asthma. Machine learning (ML) approaches have the potential for better predictive performance and generalisability over existing childhood asthma prediction models. This study applied ML approaches to predict school-age asthma (age 10) in early life (Childhood Asthma Prediction in Early life, CAPE model) and at preschool age (Childhood Asthma Prediction at Preschool age, CAPP model). Clinical and environmental exposure data was collected from children enrolled in the Isle of Wight Birth Cohort (N=1368, ∼15% asthma prevalence). Recursive Feature Elimination (RFE) identified an optimal subset of features predictive of school-age asthma for each model. Seven state-of-the-art ML classification algorithms were used to develop prognostic models. Training was performed by applying fivefold cross-validation, imputation, and resampling. Predictive performance was evaluated on the test set. Models were further externally validated in the Manchester Asthma and Allergy Study (MAAS) cohort. RFE identified eight and twelve predictors for the CAPE and CAPP models, respectively. Support Vector Machine (SVM) algorithms provided the best performance for both the CAPE (area under the receiver operating characteristic curve, AUC=0.71) and CAPP (AUC=0.82) models. Both models demonstrated good generalisability in MAAS (CAPE 8-year=0.71, 11-year=0.71, CAPP 8-year=0.83, 11-year=0.79) and excellent sensitivity to predict a subgroup of persistent wheezers. Using ML approaches improved upon the predictive performance of existing regression-based models, with good generalisability and ability to rule in asthma and predict persistent wheeze.

Highlights

Childhood asthma is highly heterogeneous, with numerous factors contributing towards its development, persistence and severity.[1,2,3] Despite approximately 80% of asthmatic children developing symptoms before the age of six, these clinical symptoms are neither universally present in early life among all future asthmatics nor specific to asthma.[4]
Both models demonstrated good generalisability in Manchester Asthma and Allergy Study (MAAS) (CAPE 8‐year = 0.71, 11‐year = 0.71, Childhood Asthma Prediction at Preschool‐age (CAPP) 8‐year = 0.83, 11‐year = 0.79) and excellent sensitivity to predict a subgroup of persistent wheezers
This study demonstrates how tools such as SHAP values[32] can be used to unravel explanations of complex black‐box machine learning algorithms that have shown to improve the accuracy of childhood asthma predictions

Summary

Introduction

Childhood asthma is highly heterogeneous, with numerous factors contributing towards its development, persistence and severity.[1,2,3] Despite approximately 80% of asthmatic children developing symptoms (such as wheeze) before the age of six, these clinical symptoms are neither universally present in early life among all future asthmatics nor specific to asthma.[4]. Machine learning (ML) approaches have the potential for better predictive performance and generalisability over existing childhood asthma prediction models. Support Vector Machine (SVM) algorithms provided the best performance for both the CAPE (area under the receiver operating characteristic curve, AUC = 0.71) and CAPP (AUC = 0.82) models. Both models demonstrated good generalisability in MAAS (CAPE 8‐year = 0.71, 11‐year = 0.71, CAPP 8‐year = 0.83, 11‐year = 0.79) and excellent sensitivity to predict a subgroup of persistent wheezers. Conclusion: Using ML approaches improved upon the predictive performance of existing regression‐based models, with good generalisability and ability to rule in asthma and predict persistent wheeze

Objectives

Results

Conclusion