Abstract
The use of anthropometric measurements in machine learning algorithms for hypertension prediction enables the development of simple, non-invasive prediction models. However, different machine learning algorithms were utilized in conjunction with various anthropometric data, either alone or in combination with other biophysical and lifestyle variables. It is essential to assess the impacts of the chosen machine learning models using simple anthropometric measurements. We developed and tested 13 machine learning methods of neural network, ensemble, and classical categories to predict hypertension in adolescents using only simple anthropometric measurements. The imbalanced dataset of 2461 samples with 30.1% hypertension subjects was first partitioned into 90% for training and 10% for validation. The training dataset was reduced to eight simple anthropometric measurements: age, C index, ethnicity, gender, height, location, parental hypertension, and waist circumference using correlation coefficient. The Synthetic Minority Oversampling Technique (SMOTE) combined with random under-sampling was used to balance the dataset. The models with optimal hyperparameters were assessed using accuracy, precision, sensitivity, specificity, F1-score, misclassification rate, and AUC on the testing dataset. Across all seven performance measures, no model consistently outperformed the others. LightGBM was the best model for all six performance metrics, except sensitivity, whereas Decision Tree was the worst. We proposed using Bayes’ Theorem to assess the models’ applicability in the Sarawak adolescent population, resulting in the top four models being LightGBM, Random Forest, XGBoost, and CatBoost, and the bottom four models being Logistic Regression, LogitBoost, SVM, and Decision Tree. This study demonstrates that the choice of machine learning models has an effect on the prediction outcomes.
Highlights
A chronic disease, known as a non-communicable disease, is a health condition that is not contagious and can endure for a long time
We examine the usage of Synthetic Minority Oversampling Technique (SMOTE) alone and the combination of SMOTE with random undersampling on the prediction results before deciding on the resampling approach
We developed and tuned thirteen machine learning (ML) models of three categories: neural network (Multilayer Perceptron), classical model (Logistic Regression, Decision Tree, Naïve Bayes, k-Nearest Neighbor), and ensemble model (Random Forest, Support Vector Machine, Gradient Boosting, XGBoost, LightGBM, CatBoost, AdaBoost and LogitBoost) to predict hypertension
Summary
A chronic disease, known as a non-communicable disease, is a health condition that is not contagious and can endure for a long time. Organization (WHO) report [1], chronic diseases claim the lives of 41 million people each year, accounting for 71% of all deaths worldwide. The majority of chronic disease fatalities, which account for 17.9 million deaths per year, are from cardiovascular disease. Hypertension is a crucial factor in the development of cardiovascular disease. Hypertension, often known as high blood pressure, is defined as a systolic blood pressure reading ≥140 mmHg and/or a diastolic blood pressure reading ≥90 mmHg. Systolic blood pressure measurements show the pressure in the blood vessels when the heart beats or contracts, whereas diastolic blood pressure measurements represent the pressure in the blood vessels when the heart rests in between beats. A recent study [2] reported that every 20 mmHg systolic and 10 mmHg diastolic pressure increase above a baseline blood pressure of 115/75 doubles the risk of cardiovascular death
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have