Early childhood asthma diagnosis is common; however, many children diagnosed before age 5 experience symptom resolution and it remains difficult to identify individuals whose symptoms will persist. Our objective was to develop machine learning models to identify which individuals diagnosed with asthma before age 5 continue to experience asthma-related visits. We curated a retrospective dataset for 9,934 children derived from electronic health record (EHR) data. We trained five machine learning models to differentiate individuals without subsequent asthma-related visits (transient diagnosis) from those with asthma-related visits between ages 5 and 10 (persistent diagnosis) given clinical information up to age 5 years. Based on average NPV-Specificity area (ANSA), all models performed significantly better than random chance, with XGBoost obtaining the best performance (0.43 mean ANSA). Feature importance analysis indicated age of last asthma diagnosis under 5 years, total number of asthma related visits, self-identified black race, allergic rhinitis, and eczema as important features. Although our models appear to perform well, a lack of prior models utilizing a large number of features to predict individual persistence makes direct comparison infeasible. However, feature importance analysis indicates our models are consistent with prior research indicating diagnosis age and prior health service utilization as important predictors of persistent asthma. We therefore find that machine learning models can predict which individuals will experience persistent asthma with good performance and may be useful to guide clinician and parental decisions regarding asthma counselling in early childhood.
Read full abstract