There has been a substantial burden of hypertension in children and adolescents. Given the availability of primary prevention strategies, it is important to determine predictors for early identification of children and adolescents at risk of hypertension. This study aims to attempt and validate machine learning (ML) algorithms for accurately predicting blood pressure (BP) status (normal, prehypertension, and hypertension) over 1- and 3-year periods, identifying key predictors without compromising model performance. We included a population-based cohort of primary 1 to secondary 6 students (typically aged 6 to 18years) during the academic years of 1995 to 1996 and 2019 to 2020 in Hong Kong. Thirty-six easy-assessed predictors were initially model childhood BP status. Multiple ML algorithms, decision tree, random forest, k-nearest neighbor, eXtreme Gradient Boosting (XGBoost), and multinomial logistic regression (MLR), were used. Model evaluation was performed by various accuracy metrics. The Shapley Additive Explanations (SHAP) was used to identify key features for both predictions. A total of 923 301 and 602 179 visit pairs were used for the 1- and 3-year predictions, respectively. XGBoost demonstrated the highest prediction accuracies for 1-year (macro-area under the receiver operating characteristic curve [AUROC] = 0.92, micro-AUROC = 0.91) and 3-year (macro-AUROC = 0.91, micro-AUROC = 0.90) periods. The traditional MLR approach had the lowest accuracies for 1- (macro-AUROC = 0.70, micro-AUROC = 0.68) and 3-year (macro-AUROC = 0.70, micro-AUROC = 0.68) predictions. The SHAP values identified 17 key predictors without the need for direct BP measurements or laboratory tests. ML prediction models can accurately predict childhood prehypertension and hypertension at 1 and 3years, independent of BP and laboratory measurements. The identified key predictors may inform areas for personalized prevention in hypertension.
Read full abstract